uint32_t值对的交换散列函数

Commutative hash function for uint32_t value pairs

本文关键字：交换散列函数 uint32 更新时间：2023-10-16

我需要一个快速、简单的哈希函数，为一对uint32_t值创建一个唯一的标识符，因此(2,7)和(7,2)的哈希值相同。

知道吗？

要回答我自己的问题，解决方案是：

uint64_t hash(uint32_t x, uint32_t y)
{
    const uint64_t a = static_cast<uint64_t>(x);
    const uint64_t b = static_cast<uint64_t>(y);
    if (x < y) return (b << 32) | a;
    else return (a << 32) | b;
}

可以改进为无分支版本

uint64_t hash(uint32_t x, uint32_t y)
{
    const uint64_t a = static_cast<uint64_t>(x);
    const uint64_t b = static_cast<uint64_t>(y);
    const uint64_t h0 = (b << 32) | a;
    const uint64_t h1 = (a << 32) | b;
    return (x < y) ? h0 : h1; // conditional move (CMOV) instruction
}

这些方法是完美的散列函数-它们保证零冲突。但是，它们的缺点是不能对2^32 - 1以上的值进行散列。

constexpr uint32_t hash_max = ...;    
constexpr uint32_t commutative_hash(uint32_t i, uint32_t j) {
   return (i*j + (i*i)*(j*j) + (i*i*i)*(j*j*j)) % hash_max;
};

额外的括号用于编译器-优化此表达式会更容易。

不要使用任何条件指令（或std::max/std::min）如果你想做一个快速的函数，它会破坏CPU管道（而且速度很慢）。