C++文件读取和拆分列

C++ file reading and split columns

本文关键字:拆分 读取 文件 C++      更新时间:2023-10-16

我不熟悉C++文件读取,但我通过pyspark做了很多工作。所以现在我有一个txt文件,内容如下:

1   52  Hayden Smith        18:16   15  M   Berlin
2   54  Mark Puleo      18:25   15  M   Berlin
3   97  Peter Warrington    18:26   29  M   New haven
4   305 Matt Kasprzak       18:53   33  M   Falls Church
5   272 Kevin Solar     19:17   16  M   Sterling
6   394 Daniel Sullivan     19:35   26  M   Sterling
7   42  Kevan DuPont        19:58   18  M   Boylston
8   306 Chris Goethert      20:00   43  M   Falls Church

如您所见,有 8 列和 351 行(我只显示了 8 行),对于每一行,[0] 是排名,[1] 是 BIB,[2] 是名字,[3] 是姓氏,[4] 是时间,[5] 是年龄,[6] 是性别,[7] 是城镇例如,第一行,1是排名,52是BIB,Hayden Smith是名字,18:16是时间,15是年龄,M是男性,柏林是城镇。

我有一个排序的链接结构,称为:类排序链接和项类型类,称为:类运行程序

你不需要担心 SortedLinked 类。

类运行程序有四个私有属性:

string name, int age, int min, int sec

在我的驱动程序文件中,我可以这样做:

SortedLinked mylist                  // initialize a sorted list
Runner M("Jordan", 22, 20, 20)       // initialize a Runner called Jordan, who is 22 years old, and finished the race in 20 mins and 20 sec
mylist.add(M) //add Runner M into my sorted list

所以我需要读取文本文件,并创建一个带有运行器名称、年龄、分钟数和秒数的 Runner 对象。将该 Runner 插入到排序的链表中。

所以如果这是在 pyspark 中,我可以这样做:

file=sc.textFile("hdfs")             //we usually use hdfs in pyspark
newfile = file.map(lambda line: line.split('t')    //for each column, they are seperated by Tabs, except column[2][3] are separated by a space 
ColumnIneed = newfile.filter(lambda r: [r[2], r[3], r[4], r[5]]) // I only need the column [2][3][4][5]
mylist = ColumnIneed.collect()    // transform the RDD into a list
Then I can just transform every row into a Runner object.

但是,在C++我只知道这一点:

ifstream, infile;
string s, sAll;
if(infile.is_open())
{
   while(getline(line, s))
   {
      s = s.rstrip('n')     //does NOT work in C++
      name, age, time = s.split('t')  // Does NOT work in C++ and I dont need all the columns

所以,问题:

1、我需要访问每一行,并剥离换行符

2

,我只需要列[2][3][4][5]//每列用制表符分隔

3、列[4]是时间,是文本文件中的字符串,我需要拆分":"并放入分钟和秒

4、列[2][3]是名字和姓氏,我需要将它们组合成字符串名称

5、列[2][3]用空格分隔

所以理想情况下,我想这样做:

while(I need a loop)
{
   eachline = access each line;
   eachline.strip('n')  //strip newline
   eachline.split('t')  //split Tabs
   string name = eachline[2][3];
   string time = eachline[4];
   int min;
   int sec;
   min, sec = time.split(':")
   int age = eachline[5];
   Runner M(name, age, min, sec)    //I don't know if this works, because it looks like you are overwriting the Runner M each time you access a new line. 
   mylist.add(M)      //add M into my linkedlist, this step you don't need to worry, I already finished. 
}

如果您有更好的方法,我将不胜感激。

一些代码片段

    std::ifstream in;
    in.open(/*path to file*/);
    std::string line;
    if(in.is_open())
    {
        while(std::getline(in, line)) //get 1 row as a string
        {
            std::istringstream iss(line); //put line into stringstream
            std::string word;
            while(iss >> word) //read word by word
            {
                std::cout << word << std::endl;
            }
            /*
            int row;
            int age;
            std::string name;
            iss >> row >> age >> name; // adopt to your input line
            Runner M(name, age, min, sec); //common agreement - variables shouldn't start with capital, you don't override M, each time u create new local variable type of Runner, then you put copy of M into some container, M gets destroyed at the end of the block, probably you could use movement semantic, but you need C++ basics first    
            mylist.add(M);
            */
        }
    }