只读取文本文件中的字母

Reading in only letters from a text file

本文关键字：取文本文件只读更新时间：2023-10-16

我正试图从一个文本文件中读取一首包含逗号、空格、句点和换行符的诗。我正在尝试使用getline来阅读每个单独的单词。我不想阅读任何逗号、空格、句点或换行符。当我阅读每个单词时，我会将每个字母大写，然后调用插入函数将每个单词作为一个单独的节点插入到二进制搜索树中。我不知道把每个单词分开的最好方法。我已经能够用空格分隔每个单词，但逗号、句点和换行符一直在读

这是我的文本文件：

玫瑰是红色的，小提琴是蓝色的，数据结构是最好的，你和我都知道这是真的。

我使用的代码是：

string inputFile;
    cout << "What is the name of the text file?";
    cin >> inputFile;
    ifstream fin;
    fin.open(inputFile);
    //Input once
    string input;
    getline(fin, input, ' ');
    for (int i = 0; i < input.length(); i++)
    {
        input[i] = toupper(input[i]);
    }
    //check for duplicates
    if (tree.Find(input, tree.Current, tree.Parent) == true)
    {
        tree.Insert(input);
        countNodes++;
        countHeight = tree.Height(tree.Root);
    }

基本上，我使用getline（fin，input，''）来读取我的输入。

我找到了一个解决方案。我能够将整行代码读入变量行，然后我搜索单词的每个字母，只保留字母，并将其存储到单词中。然后，我可以调用insert函数将Node插入到树中。

const int MAXWORDSIZE = 50;
    const int MAXLINESIZE = 1000;
    char word[MAXWORDSIZE], line[MAXLINESIZE];
    int lineIdx, wordIdx, lineLength;
    //get a line
    fin.getline(line, MAXLINESIZE - 1);
    lineLength = strlen(line);
    while (fin)
    {
        for (int lineIdx = 0; lineIdx < lineLength;)
        {
            //skip over non-alphas, and check for end of line null terminator
            while (!isalpha(line[lineIdx]) && line[lineIdx] != '')
                ++lineIdx;
            //make sure not at the end of the line
            if (line[lineIdx] != '')
            {
                //copy alphas to word c-string
                wordIdx = 0;
                while (isalpha(line[lineIdx]))
                {
                    word[wordIdx] = toupper(line[lineIdx]);
                    wordIdx++;
                    lineIdx++;
                }
                //make it a c-string with the null terminator
                word[wordIdx] = '';
                //THIS IS WHERE YOU WOULD INSERT INTO THE BST OR INCREMENT FREQUENCY COUNTER IN THE NODE
                if (tree.Find(word) == false)
                {
                    tree.Insert(word);
                    totalNodes++;
                    //output word
                    //cout << word << endl;
                }
                else
                {
                    tree.Counter();
                }
            }

对于我之前发布过几次的技术来说，这是一个很好的时机：定义一个ctype方面，它将除字母外的所有内容都视为空白（搜索imbue将显示几个示例）。

从那以后，就是std::transform的问题，istream_iterator s在输入端，std::set用于输出，lambda用于将第一个字母大写。

您可以为多个分隔符创建自定义getline函数：

std::istream &getline(std::istream &is, std::string &str, std::string const& delims)
{
    str.clear();
    // the 3rd parameter type and the condition part on the right side of &&
    // should be all that differs from std::getline
    for(char c; is.get(c) && delims.find(c) == std::string::npos; )
        str.push_back(c);
    return is;
}

并使用它：

getline(fin, input, " n,.");

您可以使用std::regex来选择您的代币

根据文件的大小，您可以逐行读取，也可以完全在std::string中读取。

要读取文件，您可以使用：

std::ifstream t("file.txt");
std::string sin((std::istreambuf_iterator<char>(t)),
                 std::istreambuf_iterator<char>());

这将对空格分隔的字符串进行匹配。

std::regex word_regex(",\s]+");
auto what = 
    std::sregex_iterator(sin.begin(), sin.end(), word_regex);
auto wend = std::sregex_iterator();
std::vector<std::string> v;
for (;what!=wend ; wend) {
    std::smatch match = *what;
    V.push_back(match.str());
}

我认为要分隔由、空格或新行分隔的标记，应该使用以下regex:(,| n| )[[:alpha:]].+。我还没有测试，可能需要你检查一下。