如何在不使用STL的情况下实现C++字典数据结构

How to implement C++ dictionary data structure without using STL

本文关键字：实现情况下 C++ 字典数据结构 STL 更新时间：2023-10-16

我正在做一个项目，主要目标是将单词列表（以及其中许多15k以上）加载到数据结构中，然后在该结构上进行搜索。我做了一点研究，据我所知，哈希表将是最好的（如果我错了，请纠正我，我也研究了尝试）

这是棘手的部分：我不能在这个项目中使用任何STL。因此，据我所知，我将不得不编写自己的哈希表类，或者找到一个非常有效的类。我知道表格是如何在基本水平上工作的，但我不确定我是否知道自己能写一整张表格。

我环顾谷歌，找不到任何合适的示例代码。

我的问题是，有人知道如何在c++中做到这一点吗？和/或在哪里我可以找到一些代码来开始。我需要表的3个基本功能：插入、搜索、删除。

当你思考这个时需要记住的事情：

首要问题是速度！这需要快速照明，而不考虑系统资源。根据我所做的阅读，哈希表可以比O（logn）做得更好考虑多线程
无法使用STL

我认为，排序字符串数组+二进制搜索应该非常高效。

std::unordered_map不是STL

http://www.cs.auckland.ac.nz/software/AlgAnim/hash_tables.html

不完全清楚所有的限制，但假设你不能使用std中的任何东西，你可以写一个像下面这样的简单类来完成这项工作。我们将使用一个bucket数组来存储数据，然后使用哈希函数将字符串转换为0…MAX_ELEMENTS范围内的数字。每个bucket都将包含一个字符串链表，因此您可以再次检索信息。通常o（1）插入和查找。

请注意，对于更有效的解决方案，您可能希望使用向量，而不是像我所说的那样使用固定长度的数组。也有最小的错误检查和其他改进，但这应该让你开始。

注意，你需要实现自己的字符串哈希函数，你可以在网上找到很多这样的函数。

class dictionary
{
    struct data
    {
        char* word = nullptr;
        data* next = nullptr;
        ~data()
        {
            delete [] word;
        }
    };
public:
    const unsigned int MAX_BUCKETS;
    dictionary(unsigned int maxBuckets = 1024)
        : MAX_BUCKETS(maxBuckets)
        , words(new data*[MAX_BUCKETS])
    {
        memset(words, 0, sizeof(data*) * MAX_BUCKETS);
    }
    ~dictionary()
    {
        for (int i = 0; i < MAX_BUCKETS; ++i)
            delete words[i];
        delete [] words;
    }
    void insert(const char* word)
    {
        const auto hash_index = hash(word);
        auto& d = words[hash_index];
        if (d == nullptr)
        {
            d = new data;
            copy_string(d, word);
        }
        else 
        {
            while (d->next != nullptr)
            {
                d = d->next;
            }
            d->next = new data;
            copy_string(d->next, word);
        }
    }
    void copy_string(data* d, const char* word)
    {
        const auto word_length = strlen(word)+1;
        d->word = new char[word_length];
        strcpy(d->word, word);
        printf("%sn", d->word);
    }
    const char* find(const char* word) const 
    {
        const auto hash_index = hash(word);
        auto& d = words[hash_index];
        if (d == nullptr)
        {
            return nullptr;
        }
        while (d != nullptr)
        {
            printf("checking %s with %sn", word, d->word);
            if (strcmp(d->word, word) == 0)
                return d->word;
            d = d->next;
        }
        return nullptr;
    }
private:

    unsigned int hash(const char* word) const
    {
        // :TODO: write your own hash function here
        const unsigned int num = 0; // :TODO:
        return num % MAX_BUCKETS;
    }
    data** words;
};

http://wikipedia-clustering.speedblue.org/trie.php

以上链接目前似乎已断开。

备选链接：https://web.archive.org/web/20160426224744/http://wikipedia-clustering.speedblue.org/trie.php

源代码：https://web.archive.org/web/20160426224744/http://wikipedia-clustering.speedblue.org/download/libTrie-0.1.gz