为什么在将单词从文本文件C++到行进行多重映射时获得额外的索引值

Why am I getting extra index values when multi-mapping words to lines from text file C++?

本文关键字:映射 索引值 单词 文本 C++ 文件 为什么      更新时间:2023-10-16

我正在开发一个多地图程序,该程序接收文本文件,删除标点,然后创建每个单词相对于它出现在哪一行的索引。代码编译并运行,但我得到的输出是我不想要的。我很确定问题在于处理标点符号。每次单词后面跟一个句点字符时,它都会计算该单词两次,即使我排除了标点。然后它会多次打印出最后一个单词,说它存在于文件中不存在的行上。一些帮助将不胜感激!

输入文件:

dogs run fast.
dogs bark loud.
cats sleep hard.
cats are not dogs.
Thank you.
#

C++代码:

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <map>
using namespace std;
int main(){
    ifstream input;
    input.open("NewFile.txt");
    if ( !input )
    {
        cout << "Error opening file." << endl;
        return 0;
    }
    multimap< string, int, less<string> >  words;
    int line; //int variable line
    string word;//string variable word
    // For each line of text, the length of input, increment line
    for (line = 1; input; line++)
    {
        char buf[ 255 ];//create a character with space of 255
        input.getline( buf, 128 );//buf is pointer to array of chars where
        //extracted, 128 is maximum num of chars to write to s.
        // Discard all punctuation characters, leaving only words
        for ( char *p = buf;
              *p != '';
              p++ )
        {
            if ( ispunct( *p ) )
                *p = ' ';
        }
        //
        istringstream i( buf );
        while ( i )
        {
            i >> word;
            if ( word != "" )
            {
                words.insert( pair<const string,int>( word, line ) );
            }
        }
    }
    input.close();
    // Output results
    multimap< string, int, less<string> >::iterator it1;
    multimap< string, int, less<string> >::iterator it2;

    for ( it1 = words.begin(); it1 != words.end(); )
    {
        it2 = words.upper_bound( (*it1).first );
        cout << (*it1).first << " : ";
        for ( ; it1 != it2; it1++ )
        {
            cout << (*it1).second << " ";
        }
        cout << endl;
    }
    return 0;
}

输出:

Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4 4
fast : 1 1
hard : 3 3
loud : 2 2
not : 4
run : 1
sleep : 3
you : 5 5 6 7

期望输出:

Thank : 5
are : 4
bark : 2
cats : 3 4
dogs : 1 2 4 
fast : 1 
hard : 3 
loud : 2 
not : 4
run : 1
sleep : 3
you : 5 

提前感谢您的帮助!

您没有删除标点符号,而是用空格替换。 istringstream尝试解析这些空间,但如果失败。您应该检查解析单词是否成功:

i >> word;
if (!i.fail()) {
    words.insert(pair<const string, int>(word, line));
}

由于您使用的是C++,因此避免使用指针会更方便,而是专注于使用 std 函数。我会像这样重写你的代码的一部分:

// For each line of text, the length of input, increment line
for (line = 1; !input.eof(); line++)
{
    std::string buf;
    std::getline(input, buf);
    istringstream i( buf );
    while ( i )
    {
        i >> word;
        if (!i.fail()) {
            std::string cleanWord;
            std::remove_copy_if(word.begin(), word.end(),
                                std::back_inserter(cleanWord),
                                std::ptr_fun<int, int>(&std::ispunct)
            );
            if (!cleanWord.empty()) {
                words.insert(pair<const string, int>(cleanWord, line));
            }
        }
    }
}
input.close();
// Output results
multimap< string, int, less<string> >::iterator it1;
multimap< string, int, less<string> >::iterator it2;