STL中关联数组(贴图）的速度

Speed of associative array (map) in STL

本文关键字：速度贴图关联数组 STL 更新时间：2023-10-16

编写了一个简单的程序来测量STL的速度。下面的代码显示，在我的Corei7-2670QM PC（2.2GHz和turbo 3.1GHz）上花费了1.49秒。如果我在循环中删除Employees[buf] = i%1000;部分，它只需要0.0132秒。所以哈希部分花费了1.48秒。为什么它那么慢？

#include <string.h>
#include <iostream>
#include <map>
#include <utility>
#include <stdio.h>
#include <sys/time.h>
using namespace std;
extern "C" {
int get(map<string, int> e, char* s){
    return e[s];
}
int set(map<string, int> e, char* s, int value) {
    e[s] = value;
}
}
double getTS() {
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec + tv.tv_usec/1000000.0;
}
int main()
{
    map<string, int> Employees;
    char buf[10];
    int i;
    double ts = getTS();
    for (i=0; i<1000000; i++) {
        sprintf(buf, "%08d", i);
        Employees[buf] = i%1000;
    }
    printf("took %f secn", getTS() - ts);
    cout << Employees["00001234"] << endl;
    return 0;
}

下面是您的代码的C++版本。请注意，在get/set中传递映射时，您应该显然通过引用获取映射。

UPDATE更进一步，认真优化给定的测试用例：

在Coliru上直播

#include <iostream>
#include <boost/container/flat_map.hpp>
#include <chrono>
using namespace std;
using Map = boost::container::flat_map<string, int>;
int get(Map &e, char *s) { return e[s]; }
int set(Map &e, char *s, int value) { return e[s] = value; }
using Clock = std::chrono::high_resolution_clock;
template <typename F, typename Reso = std::chrono::microseconds, typename... Args> 
Reso measure(F&& f, Args&&... args) {
    auto since = Clock::now();
    std::forward<F>(f)(std::forward<Args>(args)...);
    return chrono::duration_cast<Reso>(Clock::now() - since);
}
#include <boost/iterator/iterator_facade.hpp>
using Pair = std::pair<std::string, int>;
struct Gen : boost::iterators::iterator_facade<Gen, Pair, boost::iterators::single_pass_traversal_tag, Pair>
{
    int i;
    Gen(int i = 0) : i(i) {}
    value_type dereference() const { 
        char buf[10];
        std::sprintf(buf, "%08d", i);
        return { buf, i%1000 }; 
    }
    bool equal(Gen const& o) const { return i==o.i; }
    void increment() { ++i; }
};
int main() {
    Map Employees;
    const auto n = 1000000;
    auto elapsed = measure([&] {
            Employees.reserve(n);
            Employees.insert<Gen>(boost::container::ordered_unique_range, {0}, {n});
        });
    std::cout << "took " << elapsed.count() / 1000000.0 << " secn";
    cout << Employees["00001234"] << endl;
}

打印

took 0.146575 sec
234

老答案

这只是在适当的地方使用了C++

在Coliru上直播

#include <iostream>
#include <map>
#include <chrono>
#include <cstdio>
using namespace std;
int get(map<string, int>& e, char* s){
    return e[s];
}
int set(map<string, int>& e, char* s, int value) {
    return e[s] = value;
}
using Clock = std::chrono::high_resolution_clock;
template <typename Reso = std::chrono::microseconds>
Reso getElapsed(Clock::time_point const& since) {
    return chrono::duration_cast<Reso>(Clock::now() - since);
}
int main()
{
    map<string, int> Employees;
    std::string buf(10, '');
    auto ts = Clock::now();
    for (int i=0; i<1000000; i++) {
        buf.resize(std::sprintf(&buf[0], "%08d", i));
        Employees[buf] = i%1000;
    }
    std::cout << "took " << getElapsed(ts).count()/1000000.0 << " secn";
    cout << Employees["00001234"] << endl;
}

打印：

took 0.470009 sec
234

"慢"的概念当然取决于与什么相比。

我在MSVC2013上运行了您的基准测试（使用标准chrono::high_resolution_clock而不是gettimeofday（）），在2.67 GHz的Corei7-920上进行了发布配置，并发现了非常相似的结果（1.452 s）。

在你的代码中，你基本上做了一百万：

在地图中插入：Employees[buf]
地图中的更新（将新元素复制到现有元素）：= i%1000

所以我试着更好地理解时间花在哪里：

首先，映射需要存储有序的密钥，这通常是用二进制树实现的。因此，我尝试使用一个unsodered_map，它使用了一个更平坦的哈希表，并给它一个非常大的bucket大小，以避免clisions和rehashing。结果为1.198秒。
因此，大约需要20%的时间（此处）才能对地图数据进行排序访问（即，您可以使用键的顺序迭代地图：您需要这个吗？）
其次，播放插入顺序确实会对时间产生重大影响。正如Thomas Matthews在评论中指出的：为了进行基准测试，您应该使用随机顺序。
然后，使用emplace_hint()只进行优化的数据插入（无搜索无更新），时间为1.100秒。
因此需要75%的时间来分配和插入数据
最后，详细说明之前的测试，如果在emplace_hint()之后添加额外的搜索和更新，则时间会比原始时间（1.468秒）略高。这证实了对映射的访问只是时间的一小部分，并且插入需要大部分执行时间。

在这里测试上面的一点：

chrono::high_resolution_clock::time_point ts = chrono::high_resolution_clock::now();
for (i = 0; i<1000000; i++) {
    sprintf(buf, "%08d", i);
    Employees.emplace_hint(Employees.end(), buf, 0);
    Employees[buf] = i % 1000;  // matters for 300 
}
chrono::high_resolution_clock::time_point te = chrono::high_resolution_clock::now();
cout << "took " << chrono::duration_cast<chrono::milliseconds>(te - ts).count() << " millisecsn";

现在，您的基准测试不仅取决于映射的性能：您需要执行100万次sprintf()来设置缓冲区，并将100万次转换为字符串。如果你用地图代替，你会注意到整个测试只需要0.950秒，而不是1.450秒：

30%的基准测试时间不是由映射引起的，而是由您处理的许多字符串引起的

当然，所有这些都比矢量慢得多。但向量不会对其元素进行排序，也无法提供关联存储。