将文件中的单词放入散列映射(c++)

Putting words from a file into a hash map (c++)

本文关键字:映射 c++ 文件 单词放      更新时间:2023-10-16

所以,我有一个相当长的文本文件(10k+单词),我试图使用标准映射库将每个唯一的单词放入哈希映射中。

我有一个while循环从文件中读取每个单词。问题是,这个while循环似乎永远不会结束。我甚至在循环中放了一个if语句,这样如果它到达eof()就会中断循环。它仍然没有结束。下面是我到目前为止的代码:

#include <iostream>
#include <map>
#include <string>
#include <fstream>
#include <cctype>
using namespace std;

string lowerCase(string isUpper);
void main()
{
//create hash map
map<string, int> stringCounts;
//temp string
string nextString;
//import file/write file
ofstream writeFile;
ifstream gooseFile;
//open file to read from
gooseFile.open("goose.txt");
if (gooseFile.is_open()) {
    //read file word by word
    while (gooseFile >> nextString) { //WORKS DO NOT CHANGE
        //check for punctuation
        for (int i = 0; i < nextString.length(); i++) { //WORKS DO NOT CHANGE
            if (nextString[i] == ',' || nextString[i] == '!' || nextString[i] == ';' || nextString[i] == '-' || nextString[i] == '.' || nextString[i] == '?' || nextString[i] == ':' || nextString[i] == '"' || nextString[i] == '(' || nextString[i] == ')' || nextString[i] == '_' || nextString[i] == ''') {
                nextString.erase(i, i);
                i--;
            }
        }
        //put all into lowercase
        nextString = lowerCase(nextString); //WORKS DO NOT CHANGE
        //cout << nextString << endl;
        //increment key value
        stringCounts[nextString]++;
        if (gooseFile.eof())
            break;
    }
}
//close current file
gooseFile.close();
cout << "I GOT HERE!";
//now print to an output file
writeFile.open("output.txt");
if (writeFile.is_open()) {
    cout << "ITS OPEN AGAIN";
    //write size of map
    writeFile << "The size of the hash map is " << stringCounts.size() << endl;
    //write all words in map
    //create iterator
    map<string, int>::iterator i = stringCounts.begin();
    //iterate through map 
    while (i != stringCounts.end()) {
        writeFile << "The key and value is : (" << i->first << "," << i->second << ")n";
        i++;
    }
}
else
    cout << "CANT OPENn";
}

string lowerCase(string isUpper)
{
    string toReplace = isUpper;
    for (int i = 0; i < toReplace.length(); i++) {
        if (toReplace[i] >= 65 && toReplace[i] <= 90) {
            toReplace[i] = tolower(toReplace[i]);
        }
    }
    return toReplace;
}
nextString.erase(i, i);
我怀疑这是你想要的。string::erase(您正在调用的)需要一个位置(用于开始擦除)和一个计数(用于擦除多少字符)。因此,这一行擦除的字符数量相当于该字符在字符串中的位置。例如,如果i为0,这将擦除0个字符。将该事实与下一行结合:
i--;

如果第一个字符是标点符号,i将停留在0并且for循环永远不会结束。如果您只想删除1个字符,您可以这样做:

nextString.erase(i, 1);

但是替换整个for循环而使用remove/erase习语会更好。

auto new_end = std::remove_if(nextString.begin(), nextString.end(),
        [](char c) {
            // return true if c is punctuation
        });
nextString.erase(new_end, nextString.end());