优化C++代码(使用无序映射和矢量）

Optimization of a C++ code (that uses UnorderedMap and Vector)

本文关键字：映射无序 C++ 代码优化更新时间：2023-10-16

我正在尝试优化C++代码的某些部分，这需要很长时间(对于 X 量的数据，以下部分代码大约需要 19 秒，我试图在不到 5 秒的时间内完成整个过程对于相同数量的数据 - 基于我拥有的一些基准(。我有一个函数"add"，我已经在这里编写并复制了代码。我将尝试尽可能多地解释我认为理解代码所需的内容。如果我错过了什么，请告诉我。

对于 X 个数据条目，以下函数 add 称为 X 次。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data
   // hashTableMap is an unordered_map<int, std::vector<PointObject>>
   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(vector);
            it->second = pointVectorList;
        }
   }
}

你正在做很多无用的操作...如果我理解正确，简化的形式可以简单地：

void HashTable::add(const PointObject& vector) {
   hashTableMap[hash(vector)].push_back(vector);    
}

这是有效的，因为

operator[]使用地图访问地图时，
值(std::vector(通过引用返回，因此您可以直接push_back传入点到它。此std::vector将是新插入的密钥或先前存在的密钥(如果密钥已在映射中(。

另请注意，根据PointObject的大小和其他因素，按值传递vector可能比按const PointObject&传递更有效。然而，这是一种微优化，需要明智地执行分析。

与其调用hashTableMap.count(combinedHash)和hashTableMap.find(combinedHash)，不如插入新元素并检查insert()返回的内容：

在版本 (1( 和 (2( 中，该函数返回一个对对象，其第一个元素是指向新插入的迭代器元素或键等效的元素， 以及指示元素是否成功的布尔值插入与否。

此外，不要按值传递对象，因为您不必这样做。最好通过指针或引用传递它。这：

std::vector<PointObject> pointVectorList = it->second;

效率低下，因为它会创建不必要的向量副本。

这个.count()是完全没有必要的，你可以将你的函数简化为：

void HashTable::add(PointObject vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        std::vector<PointObject> pointVectorList = it->second;
        pointVectorList.push_back(vector);
        it->second = pointVectorList;
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

您还在任何地方执行复制操作。复制对象非常耗时，请避免这样做。如果可能，还要使用引用和指针：

void HashTable::add(PointObject& vector)
{
    int combinedHash = hash(vector);
    auto it = hashTableMap.find(combinedHash);
    if (it != hashTableMap.end())
    {
        it->second.push_back(vector);
    }
    else
    {
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(vector);
        hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
    }
}

这段代码可能可以进一步优化，但它需要了解hash()，了解hashTableMap的工作方式(顺便说一下，为什么它不是std::map？(和一些实验。

如果hashTableMap是一个std::map<int, std::vector<pointVectorList>>，你可以将你的函数简化为：

void HashTable::add(PointObject& vector)
{
    hashTableMap[hash(vector)].push_back(vector);
}

如果它是一个std::map<int, std::vector<pointVectorList*>>(指针(，你甚至可以避免最后一次复制操作。

如果没有if，请尝试在哈希表上插入一个空条目：

auto ret = hashTableMap.insert(
   std::make_pair(combinedHash, std::vector<PointObject>());

将添加新的空白条目，或者检索已存在的条目。在您的情况下，您不需要检查哪种情况，您只需要获取返回的迭代器并添加新元素：

auto &pointVectorList = *ret.first;
pointVectorList.push_back(vector);

假设PointObject很大并且复制它很昂贵，std::move是您的朋友。您需要确保PointObject是移动感知的(不要定义析构函数或复制运算符，或者自己提供移动构造函数和移动赋值运算符(。

void HashTable::add(PointObject vector)   // PointObject is a user-defined object
{
    int combinedHash = hash(vector);   // the function "hash" takes less than 1 second for X amount of data
   // hashTableMap is an unordered_map<int, std::vector<PointObject>>
   if (hashTableMap.count(combinedHash) == 0)
   {
        // if the hashmap does not contain the combinedHash key, then 
        //  add the key and a new vector
        std::vector<PointObject> pointVectorList;
        pointVectorList.push_back(std::move(vector));
        hashTableMap.insert(std::make_pair(combinedHash, std::move(pointVectorList)));
   }
   else
   {
        // otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
        auto it = hashTableMap.find(combinedHash);
        if (it != hashTableMap.end())
        {
            std::vector<PointObject> pointVectorList = it->second;
            pointVectorList.push_back(std::move(vector));
            it->second = std::move(pointVectorList);
        }
   }
}

在这里

使用std::unordered_map似乎不合适 - 您使用hash中的int作为键(大概(是PointObject的哈希而不是PointObject本身。本质上是双重哈希。而且，如果您需要一个PointObject来计算映射键，那么它根本不是真正的键！也许std::unordered_multiset会是更好的选择？

首先定义哈希函数形式PointObject

namespace std
{
    template<>
    struct hash<PointObject> {
        size_t operator()(const PointObject& p) const {
            return ::hash(p);
        }
    };
}

然后像

#include <unordered_set>
using HashTable = std::unordered_multiset<PointObject>;
int main()
{
    HashTable table {};
    PointObject a {};
    table.insert(a);
    table.emplace(/* whatever */);
    return 0;
}

你最大的问题是你在 else 部分复制了整个向量(以及该向量中的每个元素(两次：

std::vector<PointObject> pointVectorList = it->second;  // first copy
pointVectorList.push_back(vector);
it->second = pointVectorList;                           // second copy

这意味着每次向现有向量添加元素时，您都会复制整个向量。

如果你使用对该向量的引用，你会做得更好：

std::vector<PointObject> &pointVectorList = it->second;
pointVectorList.push_back(vector);
//it->second = pointVectorList; // don't need this anymore.

附带说明一下，在你的unordered_map中，你正在散列你的价值作为你的密钥。您可以改用哈希函数的unordered_set。