从一个段落到另一个段落,从一行到另一行阅读单词(C++)

Read words from paragraph to paragraph, from line to line (C++)

本文关键字:段落 一行 单词 C++ 另一个 一个      更新时间:2023-10-16

我正在寻找一种从文件中读取和定位单词(行号、段落号(的方法。

例如,我想跟踪文件中单词"您"的编号。每次在一行上找到这个词,我都会把行号和段号推到两个向量上

ifstream file;
file.open(input.txt)
vector<int> paragraph_number;
vector<int> line_number; 

段逐行阅读的最佳方法是什么?谢谢!

行号相当简单,因为您只需使用getline或类似的东西一次读取一行。只需跟踪您从文件中读取一行的次数即可。或者,您可以计算您运行的换行符(n(的数量。

段落

有点棘手,没有标准化的方法来查看文件中的段落。您可能需要为段落末尾使用某种字符分隔符。您可以将两个换行符解释为一个新段落,但这部分取决于您。

假设

  • 段落至少由一个空行分隔,因此一行仅包含换行

  • 即使只有空格的行也不是空行,但这没有真正的意义,我让你改变它;-(

  • 程序记住单词出现的段落行和列的编号,所有这些数字都以 1 开头,行号是全局的,而不是段落中的行排名

  • 单词仅包含字母数字字符,因此所有其他字符都被视为分隔符。这允许在"这是不可能的"中找到单词"isn"或"t",即使它们没有与其他单词被空格隔开,或者在"jean-luc"中找到"jean"等

  • 程序不检查输入的单词是否为有效单词

提案 :

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
int main(int argc, char ** argv)
{
  if (argc != 3)
      std::cerr << "Usage: " << *argv << " <file path> <word>" << std::endl;
  else {
    std::ifstream f(argv[1]);
    if (! f.is_open())
      std::cerr << "Cannot open '" << argv[1] << ''' << std::endl;
    else {
      std::string word = argv[2];
      std::string line;
      size_t line_num = 0;
      size_t paragraph_num = 0;
      std::vector<size_t> paragraph_number; 
      std::vector<size_t> line_number;
      std::vector<size_t> column_number;
      bool afterEmptyLine = true;
      while (std::getline(f, line)) {
        line_num += 1;
        if (!line.empty()) {
          if (afterEmptyLine) {
            afterEmptyLine = false;
            paragraph_num += 1;
          }
          std::size_t p = 0;
          while ((p = line.find(word, p)) != std::string::npos) {
            // check it is not a subword, suppose a word is only alphanum
            if (((p == 0) || !isalnum(line[p - 1])) &&
                ((line.length() == (p + word.length())) || !isalnum(line[p + word.length()]))) {
              paragraph_number.push_back(paragraph_num);
              line_number.push_back(line_num);
              column_number.push_back(p + 1);
            }
            p += word.length();
          }
        }
        else
          afterEmptyLine = true;
      }
      /* debug */
      std::cout << ''' << word << "' found " << paragraph_number.size() << " times :" << std::endl;
      for (size_t i = 0; i != paragraph_number.size(); ++i)
        std::cout << "t paragraph " << paragraph_number[i] 
          << " line " << line_number[i]
            << " column " << column_number[i] << std::endl;
    }
  }
  return 0;
}

编译和执行:

bruno@bruno-XPS-8300:/tmp$ g++ -pedantic -Wextra -Wall c.cc
bruno@bruno-XPS-8300:/tmp$ cat fw
is it you or not you?
this is your decision and you are right
you and me

you
bruno@bruno-XPS-8300:/tmp$ ./a.out
Usage: ./a.out <file path> <word>
bruno@bruno-XPS-8300:/tmp$ ./a.out fw you
'you' found 5 times :
     paragraph 1 line 1 column 7
     paragraph 1 line 1 column 18
     paragraph 1 line 2 column 27
     paragraph 2 line 4 column 1
     paragraph 3 line 8 column 1
bruno@bruno-XPS-8300:/tmp$ 

(在文件中,空行实际上是空的(

尝试这样的事情:

ifstream file("input.txt");
vector<int> paragraph_number;
vector<int> line_number;
string line, word;
int curr_paragraph_num = 0;
int curr_line_num = 0;
bool in_paragraph = false;
while (getline(file, line))
{
    ++curr_line_num;
    if (line.empty())
    {
        in_paragraph = false;
    }
    else
    {
        if (!in_paragraph)
        {
            in_paragraph = true;
            ++curr_paragraph_num;
        }
        istringstream iss(line);
        while (iss >> word)
        {
            if (word == "you")
            {
                paragraph_number.push_back(curr_paragraph_num);
                line_number.push_back(curr_line_num);
            }
        }
    }
}