将单独文本文件的段落拆分为单独的字符串

Splitting paragraph of separate text file into separate strings

本文关键字:单独 字符串 拆分 段落 文件 文本      更新时间:2023-10-16

我想要一些关于将一个单独的文本文件中的段落拆分为自己的字符串的建议/帮助。到目前为止,我的代码只计算该段中的单词总数,但我想将其拆分,使每行为1个句子,然后计算该句子/行中的单词数量,然后将其放入自己的数组中,这样我就可以用特定的感知/行做其他事情。以下是我的代码:

#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
 std::ifstream inFile;
 inFile.open("Rhymes.txt", std::ios::in);
 if (inFile.is_open())
 {
     string word;
     unsigned long wordCount = 0;
     while (!inFile.eo())
     {
        inFile >> word;
        if (word.length() > 0)
        {
            wordCount++;
        }
     }
     cout << "The file had " << wordCount << " word(s) in it." << endl;
 } 

 system("PAUSE");
 return 0;
}

单独的文本文件称为"Rhymes.txt",其中包含:

Today you are You, that is truer than true. There is no one alive who is Youer than You.
The more that you read, the more things you will know. The more that you learn, the more places you'll go.
How did it get so late so soon? Its night before its afternoon.
Today was good. Today was fun. Tomorrow is another one.
And will you succeed? Yes indeed, yes indeed! Ninety-eight and three-quarters percent guaranteed!
Think left and think right and think low and think high. Oh, the things you can think up if only you try!
Unless someone like you cares a whole awful lot, nothing is going to get better. It's not.
I'm sorry to say so but, sadly it's true that bang-ups and hang-ups can happen to you.

所以第一行是它自己的句子,当代码执行时,它会说:

The line has 19 words in it

我也有点困惑我该怎么做。我见过把句子分成单词的例子,但我找不到任何我能真正理解的与我所要求的有关的东西。

假设每个空白正好是一个空白字符,并且没有填充/klemping,则可以通过std::count进行计数。行中的读取可以通过std::getline完成。

int main()
{
    // Simulating the file:
    std::istringstream inFile(
R"(Today you are You, that is truer than true. There is no one alive who is Youer than You.
The more that you read, the more things you will know. The more that you learn, the more places you'll go.
How did it get so late so soon? Its night before its afternoon.
Today was good. Today was fun. Tomorrow is another one.
And will you succeed? Yes indeed, yes indeed! Ninety-eight and three-quarters percent guaranteed!
Think left and think right and think low and think high. Oh, the things you can think up if only you try!
Unless someone like you cares a whole awful lot, nothing is going to get better. It's not.
I'm sorry to say so but, sadly it's true that bang-ups and hang-ups can happen to you.)");
    std::vector<std::string> lines; // This vector will contain all lines.
    for (std::string str; std::getline(inFile, str, 'n');)
    {
        std::cout << "The line has "<< std::count(str.begin(), str.end(), ' ')+1 <<" words in itn";
        lines.push_back(std::move(str)); // Avoid the copy.
    }
    for (auto const& s : lines)
        std::cout << s << 'n';
}

如果以后需要每句话的字数,请保存一个std::pair<std::string, std::size_t>以保存行数和字数-将循环体更改为:

        std::size_t count = std::count(str.begin(), str.end(), ' ') + 1;
        std::cout << "The line has "<<count<<" words in itn";
        lines.emplace_back(std::move(str), count);

我会写这样的东西:

vector<string> read_line()
{  string line, w;
   vector<string> words;
   getline(cin, line);
   stringstream ss(line);
   while(ss >> w)
     words.push_back(w);
   return words;
}

返回的矢量包含您需要的信息:字数和单词本身(带有标点符号,您可以轻松删除)。

vector<string> words = read_line();
cout << "This line has " << words.size() << " words in it" << endl;

要读取您所做的所有行:

while(1)
{  vector<string> words = read_line();
   if(words.size() == 0) break;
   // process line
}