改进价值哈希

Improving value hashing

本文关键字：哈希更新时间：2023-10-16

目前我有以下几种：

class Transform
{
    int N; // set in other functions
    std::unordered_map<int,float> cache;
    float Wn(int n)
    {
        std::unordered_map<int,float>::const_iterator got = cache.find(n);
        if(got == cache.end())
        return cache[n] = sin((M_PI / (2 * N)) * (n + 0.5f));
        return cache[n];
}

由于函数Wn被称为LOT，并且只有n参数发生了更改，所以我尝试缓存它们。我的问题是，在许多情况下，该函数比没有缓存的情况下需要更长的时间，有时甚至需要25%的时间。有没有优化的方法？

假设输入整数通常在某个小范围内，只需使用一个数组作为缓存即可。即使有些值不能缓存，它也会比哈希更高效。

我仔细查看了您的代码以及可以做些什么来改进它

下面是函数wN（）的3个注释版本：

在这种情况下，这并不重要（优化器负责冗余查找），有一种可以说更惯用的正确方法来编写Wn()

首先是原始版本：

float original_cached_wN(int n)
{
    // compute hash and search
    std::unordered_map<int,float>::const_iterator got = cache.find(n);
    if(got == cache.end())
        // recompute hash
        // search again
        // default construct 
        // overwrite
        return cache[n] = compute_wN(n);
    // recompute hash
    // search again
    // default construct 
    // overwrite
    return cache[n];
}

一个惯用的改进版本，考虑到迭代器为我们提供了访问值的权限，这意味着我们不需要使用名义上昂贵的operator[]

float improved_cached_wN(int n)
{
    // search
    std::unordered_map<int,float>::const_iterator got = cache.find(n);
    if(got == cache.end())
    {
        // emplace the computed value and recover its location
        // from the returned pair<iterator, bool>
        got = cache.emplace(n, compute_wN(n)).first;
    }
    // got is an iterator. got->first is the index, got->second is the value
    return got->second;
}

最后在不缓存的情况下简单地计算Wn：

float compute_wN(int n) const
{
    return sin((M_PI / (2 * N)) * (n + 0.5f));
}

下面是一个测试程序，它允许查看由以下3种方法产生的编译源代码：

#include <iostream>
#include <cmath>
#include <unordered_map>
#include <sstream>
#include <vector>
class Transform
{
    int N;
    std::unordered_map<int,float> cache;
public:
    Transform(int N) : N(N) {}
    float original_cached_wN(int n)
    {
        std::unordered_map<int,float>::const_iterator got = cache.find(n);
        if(got == cache.end())
            return cache[n] = compute_wN(n);
        return cache[n];
    }
    float improved_cached_wN(int n)
    {
        std::unordered_map<int,float>::const_iterator got = cache.find(n);
        if(got == cache.end())
        {
            got = cache.emplace(n, compute_wN(n)).first;
        }
        return got->second;
    }
    float compute_wN(int n) const
    {
        return sin((M_PI / (2 * N)) * (n + 0.5f));
    }
};

int main()
{
    using namespace std;
    // this is to defeat the optimiser
    // and prefent compile-time evaluation of Wn
    std::istringstream ss ("5 4 6 7");
    int N = 10, n1 = 0, n2 = 1, n3 = 2;
    ss >> N >> n1 >> n2 >> n3;
    Transform t1(N);
    std::vector<float> v = {
        t1.original_cached_wN(n1),
        t1.improved_cached_wN(n2),
        t1.compute_wN(n3)
    };
    std::copy(v.begin(), v.end(), std::ostream_iterator<float>(cout, ", "));
    std::cout << std::endl;

    return 0;
}

预期输出：

0.987688, 0.891007, 0.707107,

从编译后的输出来看，在我看来，搜索和更新地图的成本实际上超过了计算W(n) 的成本

以下是apple clang 7在使用选项-O3 -march=native 编译后为compute_wN()发出的代码

movl    (%rdi), %eax
addl    %eax, %eax
vcvtsi2sdl  %eax, %xmm0, %xmm0
vmovsd  LCPI2_0(%rip), %xmm1    ## xmm1 = mem[0],zero
vdivsd  %xmm0, %xmm1, %xmm0
vcvtsi2ssl  %r15d, %xmm0, %xmm1
vaddss  LCPI2_1(%rip), %xmm1, %xmm1
vcvtss2sd   %xmm1, %xmm1, %xmm1
vmulsd  %xmm0, %xmm1, %xmm0
callq   _sin

老实说，这比地图操作少了很多代码。