带有计数器C++的字符串向量

Vector of strings with a counter C++

本文关键字：字符串向量 C++ 计数器更新时间：2023-10-16

我正在输入一个字符文件，每个单词在向量中都有自己的位置。然后，我需要跟踪每个单词，并找出每个唯一单词出现的次数，以便：

有三棵树

树树应输出：

那里 1 为 1 三 1 树木 3

我想知道如何使用字符串向量来保持每个单词的轨迹。我会做一个字符串向量，每个字符串都有一个 int 的向量吗？

不要用螺丝刀打钉子。std::vector对于这个任务的最基本形式不是特别有用：简单的频率计算。来自标准输入的任意输入最好使用关联容器，其中键是输入字符串，值是累积频率。

无序频率计算

无序映射类std::unordered_map，键入std::string并映射到该字符串的频率计数器，可用于跟踪基本频率。例如：

#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>
int main()
{
std::unordered_map<std::string, unsigned> m;
std::string word;
while (std::cin >> word)
++m[word]; // increment the count for this word
for (auto const& pr : m)
std::cout << pr.first << ':' << pr.second << 'n';
}

词典顺序频率

注意：使用关联容器std::unordered_map没有特定的顺序(因此得名)。如果需要字典顺序，可以简单地使用常规std::map。如：

#include <iostream>
#include <vector>
#include <string>
#include <map>
int main()
{
std::map<std::string, unsigned> m;
std::string word;
while (std::cin >> word)
++m[word];
for (auto const& pr : m)
std::cout << pr.first << ':' << pr.second << 'n';
}

位置保持频率计算

在计算频率计数器时，在输入流中保持单词出现的位置也是可能的，并且只需要多一点代码。像以前一样选择无序或有序关联容器，但不是映射到unsigned，而是映射到std::vector<unsigned>，在那里我们在消费输入单词时积累了一个单词计数器。每个矢量的总体大小仍保留频率计数器，但矢量本身保留关联单词在输入流中出现的位置。例如：

#include <iostream>
#include <vector>
#include <string>
#include <map>
int main()
{
std::map<std::string, std::vector<unsigned int>> m;
std::string word;
unsigned ctr = 0;
while (std::cin >> word)
m[word].push_back(++ctr);
for (auto const& pr : m)
{
std::cout << pr.first << ':' << pr.second.size() << " { ";
for (auto pos : pr.second)
std::cout << pos << ' ';
std::cout << "}n";
}
}

这将生成以下形式的输出：

word : frequency { n1 n2 n3... }

其中word是一个不同的单词，frequency是输入流中的整体频率，n1,n2,n3,...是单词在处理过程中出现的位置(从 1 开始)。

希望这些方法之一有用。

您可以使用 c++ 中的 multiset 类，该类将跟踪您将每个单词添加到集合中的次数。另外请记住，您可以在 c++ 的流中读取完整的单词，它会自动跳过任何空格字符。

我将从 stdin 中读取此示例(注意，我没有编译这个，只是为了展示这个想法)。

#include <set>
using namespace std;
int main(){
string word;
multiset<string> ocurrences;
while(cin >> word){
ocurrences.insert(word);
}
for(string w : ocurrences){  // Iterate over all words in the set
cout<<w<<" "<<counts.count(w)<<" ";
}
}

如注释中所述，如果您想按第一次出现的顺序打印单词，只需保留一个vector<string>并添加您阅读的每个单词(如果它尚未在集合中)，然后迭代此向量而不是集合。

#include <set>
using namespace std;
int main(){
string word;
vector<string> words;
multiset<string> ocurrences;
while(cin >> word){
if(ocurrences.count(word) == 0) //Is this the first time we see this word?
words.push_back(word);
ocurrences.insert(word);
}
for(string w : words){ //Iterate over the words in the order
//they appeared in the input.
cout<<w<<" "<<ocurrences.count(w)<<" ";
}
}

另一件事，即使多重集更适合解决这个特定问题，您在问题中提出的称为映射，一种将键与值(可能不同类型的)相关联的数据结构。C++已经有一个地图实现。在这种情况下，您需要一个map<string, int>将每个单词与其出现的时间相关联。

您可以通过在单词流上累积字典并使用 C++17 个结构化绑定来执行此操作：

int main()
{
std::istringstream words( "There are three trees trees trees" );
auto dic = std::accumulate(
std::istream_iterator< std::string >( words ) ,
std::istream_iterator< std::string >( ) ,
std::unordered_map< std::string , int >( ) ,
[]( auto && map , auto && word ) -> decltype( auto )
{
auto [ it , success ] = map.try_emplace(
std::forward< decltype( word ) >( word ) , 0 );
++ it->second;
return std::forward< decltype( map ) >( map );
} );
for ( const auto & [ key , value ] : dic )
{
std::cout << key << ": " << value << std::endl;
}
}

住在科里鲁(虽然有一些警告)

> trees: 3
> three: 1
> There: 1
> are: 1