使用字符串::find搜索不同的单词.对诸如“;是;

using string::find to search for words that are distinct. Having problems with smaller words like "is."

本文关键字：单词字符串 find 搜索更新时间：2023-10-16

对于一项作业，我应该输入一段文本，并根据该段文本生成一个不同单词的列表以及它们的出现频率。例如，短语"pie eating pie smile"具有3 distinct个单词。

我遇到的主要问题是string：：find在单词"comprise"中看到了诸如"is"之类的单词，所以单词"is"是不明显的。

我会使用字符串提取器从文件中读取单词，将它们插入到std::set中，然后打印出结果中不同的单词：

std::istream in("yourfile.txt");
std::set<std::string> words {std::istream_iterator<std::string>(in), 
                             std::istream_iterator<std::string>()};
std::copy(words.begin(), words.end(), 
          std::ostream_iterator<std::string>(std::cout, "n"));

要获得出现频率，请切换到std::map<std::string, size_t>，并在阅读时增加每个单词的条目：

std::map<std::string, size_t> counts;
std::string word;
while (infile >> word)
  ++counts[word];

照原样，这将使单词按字母顺序排列。如果你不在乎这一点，你可能会（也可能不会）通过使用std::unordered_map来提高速度。

您的算法应该是这样的：

读一行
将记号（单词）分开
增加每个找到的令牌的计数
在1处重复，除非EOF

您应该能够为自己提取和处理每个令牌。

不要试图在未kenized的输入中查找令牌
提示：查看std::unordered_map<string, size_t>。它允许您高效地查找代币并更新其计数。