需要在c++中创建一个单词匹配器

need to create a word matcher in c++

本文关键字:一个 单词匹 c++ 创建      更新时间:2023-10-16

需要创建一个单词匹配器,该匹配器计算特定单词在文本文件中被提及的次数。以下是我到目前为止所做的,我不确定我做错了什么。一个文本文件包含一个很长的段落,另一个只包含几个单词。我需要比较两个文本文件,例如单词"answers"是在短文本文件。需要将它与长段落进行比较,看看这个单词出现了多少次,然后在程序末尾有一个报告,显示这个。

E。G和- 6倍,但- 0倍,它- 23倍。

^^像这样。不知道如何开始制作

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
    ifstream infile("text1.txt");
    if(!infile)
    {
        cout << "Error";
    }
    string words[250];
    int counter = 0;
    while (!infile.eof() )
    {
        infile >> words[counter];
        counter++;
    }
    ifstream infile2("banned.txt");
    if(!infile2)
    {
        cout << "Error";
    }
    string bannedwords[250];
    counter = 0;
    while (!infile2.eof() )
    {
        infile2 >> words[counter];
        counter++;
    }
    int eatcount= 0;
    int orcount = 0;
    int hellocount = 0;
    int number;
    for(int i=0; i<200; i++)
    {
        for(int j = 0; j < 8; j++)
        {
            if ( words[i] == bannedwords[j])
            {
                cout << words[i] << " ";
                if (words[i]=="eat")
                {
                    eatcount++;
                }
                else if (words[i] == "or")
                {
                    orcount++;
                }
                else if (words[i]== "hello")
                {
                    hellocount++;
                }
            }
        }
    }
    cout << endl;
    cout<< "eat was found "<<eatcount<<" times";
    cout << endl;
    cout<< "or was found "<<orcount<<" times";
    cout << endl;
    cout<< "hello was found "<<hellocount<<" times";
    system("pause");
}

为什么不使用std::multiset呢?

ifstream infile("text1.txt");
if(!infile)
{
    cout << "Error";
}
std::multiset<string> words;
string tmp;
while (!infile.eof() )
{
    infile >> tmp;
    words.insert(tmp);
}

然后也使用映射来表示禁用的单词:

ifstream infile2("banned.txt");
if(!infile2)
{
    cout << "Error";
}
std::map<string, int> banned;
string tmp;
while (!infile2.eof() )
{
    infile2 >> tmp;
    banned.insert(tmp);
}

然后可以使用std::multiset::count(string)来查找单词,而不需要所有额外的循环。你只需要一个循环来遍历你的禁用单词列表。例句:

std::map<string, int>::iterator bannedwordIter = bannedwords.begin();
for( ; bannedwordIter != bannedwords.end(); ++bannedwordIter )
{
  bannedwordIter->second = words.count(bannedwordIter->first);
  // you could print here as you process, or have another loop that prints it all after you finish
  cout << bannedwordIter->first << " - " << bannedwordIter->second << " times." << endl;
}

一个最小的方法是使用正则表达式,像这样

#include <iostream>
#include <fstream>
#include <string>
#include <regex>
using namespace std;
unsigned countMatches(std::istream &is, std::string const &word)
{
    string text;
    unsigned count(0);    
    std::regex  const expression(word);
    while (getline(is, text)) {
        count += distance(sregex_iterator(
            text.begin(), text.end(), expression), sregex_iterator());
    }
    return count;
}

所以你只需要传递给它输入流(在你的例子中是一个输入文件流)它就会在创建一个匹配该单词

的正则表达式后计算指定单词的出现次数
int main()
{
    ifstream ifs;
    ifs.open("example_text_file.txt");
    cout << countMatches(ifs, "word_you_want_to_search_for") << endl;
    return 0;
}