解释这些分组函数的时间复杂度

Explain the time complexity of these grouping functions

本文关键字：函数时间复杂度解释更新时间：2023-10-16

在这里，我有 800 个派生的 Base类和 8000000 个这些类型的对象的列表，这些对象可以是任意顺序的。目标是尽可能高效地将列表分成 800 种类型。在这里，我编写了两个函数来执行此操作。第一个应该是在 O（M*logN）时间，其中 M 是列表的大小，N = Base的具体派生类的数量，第二个应该是在 O（M）时间。但是当我对输出进行计时时，第二个显然不是比第一个快 log800 倍。我在这里弄错了时间复杂度吗？更好的是，是否有更快的功能使整个比较成为有争议的问题？

#include <iostream>
#include <list>
#include <unordered_map>
#include <array>
#include <ctime>
class Base {
public:
    virtual std::size_t ID() const = 0;
};
template <std::size_t N> class Derived : public Base {
    virtual std::size_t ID() const override {return N;}
};
const std::size_t NumDerivedTypes = 800;
template <typename Iterator>
std::unordered_map<std::size_t, std::list<typename Iterator::value_type>> separateWithMap (Iterator first, Iterator last) {
    std::unordered_map<std::size_t, std::list<typename Iterator::value_type>> map;
    while (first != last) {
        const auto it = map.find ((*first)->ID());
        if (it != map.end()) {
            it->second.emplace_back(*first);
        }
        else {
            std::list<typename Iterator::value_type> newGroup = {*first};
            map.emplace ((*first)->ID(), newGroup);
        }
        first++;
    }
    return map;
}
template <typename Iterator>
std::array<std::list<typename Iterator::value_type>, NumDerivedTypes> separateWithArray (Iterator first, Iterator last) {
    std::array<std::list<typename Iterator::value_type>, NumDerivedTypes> array;
    while (first != last) {
        array[(*first)->ID()].emplace_back(*first);
        ++first;
    }
    return array;
}
// ------------------------------- Testing -------------------------------
template <std::size_t N>
void build (std::list<Base*>& weapons) {
    weapons.emplace_back(new Derived<N>);
    build<N+1>(weapons);
}
template <>
void build<NumDerivedTypes> (std::list<Base*>&) {}  // End of recursion.
struct Timer {
    const std::clock_t begin = std::clock();
    ~Timer() {
        auto end = std::clock();
        std::cout << double(end - begin) / CLOCKS_PER_SEC << " seconds.n";
    };
};
int main() {
    // M = scrambled.size(), N = number of concrete derived classes of Base.
    std::list<Base*> scrambled;
    for (std::size_t i = 0; i < 10000; i++)
        build<0>(scrambled);  // Assume 'scrambled' has many, many elements in some unknown order.
    std::cout << "scrambled.size() = " << scrambled.size() << 'n';  // 8000000
    {
        std::cout << "nseparateWithMap started:n";  // O(M*logN) time
        Timer timer;
        const std::unordered_map<std::size_t, std::list<Base*>> separated = separateWithMap (scrambled.begin(), scrambled.end());
        std::cout << "separateWithMap ended:n";
    }
    {
        std::cout << "nseparateWithArray started:n";  // O(M) time            
        Timer timer;
        const std::array<std::list<Base*>, NumDerivedTypes> partitioned = separateWithArray (scrambled.begin(), scrambled.end());
        std::cout << "separateWithArray ended:n";
    }
}

输出：

scrambled.size() = 8000000
separateWithMap started.
separateWithMap ended.
30.318 seconds.
separateWithArray started.
separateWithArray ended.
22.869 seconds.

顺便说一下，这两个函数都成功地将对象分离到各自的类型（经过测试），但由于显而易见的原因，我没有在输出中显示它。

第一个应该是 O（M*logN）时间，其中 M 是列表的大小，N = Base 的具体派生类的数量

其实不然。 unordered_map是哈希表，查找和插入平均具有恒定的复杂性。所以第一个还是O(M).只是比简单的阵列版本有更多的工作。

作为旁注，使用 operator[] 会稍微简化您的逻辑：

for (; first != last; ++first) {
    map[(*first)->ID()].emplace_back(*first);
}

与您的阵列版本一模一样。