计算文本文件每行的第一个数字

Count first digit on each line of a text file

本文关键字：第一个数字文本文件计算更新时间：2023-10-16

我的项目接受一个文件名并打开它。我需要读取.txt文件的每一行，直到第一个数字出现，跳过空白、字符、零或特殊字符。我的文本文件看起来像这样:

1435                 //1, nextline
0                   //skip, next line
                    //skip, nextline
(*Hi 245*) 2       //skip until second 2 after comment and count, next line
345 556           //3 and count, next line 
4                //4, nextline

我想要的输出是一直到9，但是我把它压缩了:

Digit Count Frequency
1:      1     .25
2:      1     .25
3:      1     .25
4:      1     .25

我的代码如下:

    #include <iostream>
    #include <fstream>
    #include <string>
    using namespace std;
    int main() {
        int digit = 1;
        int array[8];
        string filename;
        //cout for getting user path
        //the compiler parses string literals differently so use a double backslash or a forward slash
        cout << "Enter the path of the data file, be sure to include extension." << endl;
        cout << "You can use either of the following:" << endl;
        cout << "A forwardslash or double backslash to separate each directory." << endl;
        getline(cin,filename);
        ifstream input_file(filename.c_str());
        if (input_file.is_open()) { //if file is open
            cout << "open" << endl; //just a coding check to make sure it works ignore
       string fileContents; //string to store contents
       string temp;
       while (!input_file.eof()) { //not end of file I know not best practice
       getline(input_file, temp);
       fileContents.append(temp); //appends file to string
    }
       cout << fileContents << endl; //prints string for test
        }
        else {
            cout << "Error opening file check path or file extension" << endl;
        }

在这个文件格式中，(*表示注释的开始，所以从那里到匹配的*)的所有内容都应该被忽略(即使它包含一个数字)。例如，给定(*Hi 245*) 6的输入，应该计算6，而不是2。

我如何遍历文件只找到第一个整数和计数，而忽略注释?

解决这个问题的一个方法是:

创建std::map<int, int>，键为数字，值为计数。这允许您计算数字的统计信息，例如在之后的计数和频率。在这个SO的答案中也可以找到类似的东西。
使用std::getline读取文件的每一行作为std::string，如本SO答案所示。

对于每一行，使用如下函数剥离注释:

std::string& strip_comments(std::string & inp, 
                            std::string const& beg, 
                            std::string const& fin = "") {
  std::size_t bpos;
  while ((bpos = inp.find(beg)) != std::string::npos) {
    if (fin != "") {
      std::size_t fpos = inp.find(fin, bpos + beg.length());
      if (fpos != std::string::npos) {
        inp = inp.erase(bpos, fpos - bpos + fin.length());
      } else {
        // else don't erase because fin is not found, but break
        break;
      }
    } else {
      inp = inp.erase(bpos, inp.length() - bpos);
    }
  }
  return inp;
}

可以这样使用:

std::string line;
std::getline(input_file, line);
line = strip_comments(line, "(*", "*)");

去掉注释后，使用字符串成员函数find_first_of查找第一个数字:
```
std::size_t dpos = line.find_first_of("123456789");
```
返回的是第一个数字在字符串中的索引位置。您应该检查返回的位置是否不是std::string::npos，因为这表明没有找到数字。如果找到第一个数字，可以使用const char c = line[dpos];提取对应的字符，并使用std::atoi将其转换为整数。
增加std::map中该数字的计数，如第一个链接SO答案所示。然后循环读取下一行。
从文件中读取所有行后，std::map将包含在除去注释的每行中找到的所有第一个数字的计数。然后，您可以遍历该映射以检索所有计数，对找到的所有数字累加总数，并计算每个数字的频率。请注意，没有找到的数字将不会在地图中。

我希望这能帮助你开始。我把写代码的事交给你了。好运！