在单词表中查找变位词

Finding anagrams in a word list

本文关键字：查找单词表更新时间：2023-10-16

我有一个单词列表和一个包含许多变位词的文件。这些字谜是单词表中的单词。我需要开发一种算法来找到匹配的单词，并在输出文件中生成它们。到目前为止，我开发的代码只适用于前两个单词。此外，我无法让代码在任何地方都能很好地处理包含数字的字符串。请告诉我如何修复代码。

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main (void)
{
int x = 0, y = 0;
int a = 0, b = 0;
int emptyx, emptyy;
int match = 0;
ifstream f1, f2;
ofstream f3;
string line, line1[1500], line2[50];
size_t found;
f1.open ("wordlist.txt");
f2.open ("file.txt");
f3.open ("output.txt");
while (f1.eof() == 0)
{
    getline (f1, line);
    line1[x] = line;
    x++;
}
while (f2.eof() == 0)
{
    getline (f2, line);
    line2[y] = line;
    y++;
}
//finds position of last elements
emptyx = x-1;
emptyy = y-1;
//matching algorithm
for (y = 0; y <= emptyy; y++)
{
    for (x = 0; x <= emptyx; x++)
    {
        if (line2[y].length() == line1[x].length())
        {
            for (a = 0; a < line1[x].length(); a++)
            {
                found = line2[y].find(line1[x][a]);
                if (found != string::npos)
                {
                    match++;
                    line2[y].replace(found, 1, 1, '.');
                    if (match == line1[x].length())
                    {
                        f3 << line1[x] << ", ";
                        match = 0;
                    }
                }
            }
        }
    }
}
f1.close();
f2.close();
f3.close();
return 0;
}

步骤1：用单词列表中每个单词中已排序字符的键构建索引，并将值作为单词。

act   -  cat
act   -  act
dgo   -  dog
...
aeeilnppp - pineapple
....
etc...

步骤2：对于你想找到的每个变位词，对变位词中的字符进行排序，然后与索引匹配，用匹配的排序键从索引中检索所有单词。

尝试改进Mitch小麦的解决方案：

存储排序顺序和单词实际上是不必要的——只存储列表中每个单词的排序字符串。
无论如何，当我们从文件中读取一个单词时，我们必须对其进行排序，以确定它是否等于已排序的字符串——并且索引是在已排序的串上索引的，所以这无论如何都没有帮助。

使用单词列表中的单词构建一个"位置独立"哈希，并将排序后的字符串存储在哈希中。
对于文件中的每个单词，获取"位置无关"哈希并签入哈希表。
如果命中，则对哈希中存储在该位置的每个已排序字符串进行排序和比较（碰撞！）。

想法？