为什么我的std::unordereded_map访问时间不是恒定的

Why is my std::unordered_map access time not constant

本文关键字：时间 map std 我的 unordereded 为什么访问更新时间：2023-10-16

我写了一些代码来测试我的无序映射性能，并将2分量向量作为关键字。

std::unordered_map<Vector2i, int> m;                                                                      
for(int i = 0; i < 1000; ++i)                                                                             
    for(int j = 0; j < 1000; ++j)                                                                         
        m[Vector2i(i,j)] = i*j+27*j;                                                                      
clock.restart();                                                                                          
auto found = m.find(Vector2i(0,5));                                                                                                                                                            
std::cout << clock.getElapsedTime().asMicroseconds() << std::endl;

以上代码的输出：56（微秒）当我将for循环中的1000替换为100时，输出为2（微秒）时间不是应该是恒定的吗？

我的Vector2i:的哈希函数

namespace std                                                                                                    
{
   template<>                                                                                                   
    struct hash<Vector2i>                                                                                        
    {                                                                                                            
        std::size_t operator()(const Vector2i& k) const                                                          
        {                                                                                                        
            using std::size_t;                                                                                   
            using std::hash;                                                                                     
            using std::string;                                                                                   
            return (hash<int>()(k.x)) ^ (hash<int>()(k.y) << 1);                                                 
        }                                                                                                        
    };                                                                                                           

}

编辑：我添加了这个代码来计算for循环后的冲突：

for (size_t bucket = 0; bucket != m.bucket_count(); ++bucket)                                             
    if (m.bucket_size(bucket) > 1)                                                                        
         ++collisions;

具有100*100个元素：碰撞=256

1000*1000个元素：冲突=2048

哈希表保证恒定的摊销时间。如果哈希表平衡良好（即哈希函数良好），那么大多数元素将均匀分布。然而，如果哈希函数不太好，可能会发生很多冲突，在这种情况下，要访问元素，通常需要遍历链表（在其中存储冲突的元素）。因此，首先要确保负载因子和哈希函数在您的情况下是可以的。最后，确保您在发布模式下编译代码，并启用优化（例如，针对g++/crang++的-O3）。

这个问题可能也很有用：如何创建一个具有64位输出的好的hash_combine（灵感来自boost:：hash_combie）。