从大型文本文件读取到Qt中的结构数组中

Reading from a large text file into a structure array in Qt?

本文关键字：结构数组 Qt 大型文本文件读取更新时间：2023-10-16

我必须将一个文本文件读取到一个结构数组中。我已经写了一个程序，但它花费了太多时间，因为文件中大约有13个lac结构。请给我建议用C++做这件事的最好、最快的方法。

这是我的代码：

std::ifstream input_counter("D:\cont.txt");
/**********************************************************/
int counter = 0;
while( getline(input_counter,line) )
{
    ReadCont( line,&contract[counter]); // function to read data to structure
    counter++;
    line.clear();
}
input_counter.close();

保持您的"解析"尽可能简单：在您知道字段格式的地方应用知识，例如

ReadCont("|PE|1|0|0|0|0|1|1||2|0||2|0||3|0|....", ...)

应该应用快速的字符到整数转换，类似

ReadCont(const char *line, Contract &c) {
   if (line[1] == 'P' && line[2] == 'E' && line[3] == '|') {
     line += 4;
     for (int field = 0; field < K_FIELDS_PE; ++field) {
       c.int_field[field] = *line++ - '0';
       assert(*line == '|');
       ++line;
     }
   }

好吧，小心细节，但你明白了。。。

在这种情况下，我将完全使用Qt。

struct MyStruct {
    int Col1;
    int Col2;
    int Col3;
    int Col4;
    // blabla ...
};
QByteArray Data;
QFile f("D:\cont.txt");
if (f.open(QIODevice::ReadOnly)) {
    Data = f.readAll();
    f.close();
}
MyStruct* DataPointer = reinterpret_cast<MyStruct*>(Data.data());
// Accessing data
DataPointer[0] = ...
DataPointer[1] = ...

现在您有了数据，可以将其作为数组进行访问。

如果你的数据不是二进制的，并且你必须首先解析它，你将需要一个转换例程。例如，如果您读取具有4列的csv文件：

QVector<MyStruct> MyArray;
QString StringData(Data);
QStringList Lines = StringData.split("n"); // or whatever new line character is
for (int i = 0; i < Lines.count(); i++) {
    String Line = Lines.at(i);
    QStringList Parts = Line.split("t"); // or whatever separator character is
    if (Parts.count() >= 4) {
        MyStruct t;
        t.Col1 = Parts.at(0).toInt();
        t.Col2 = Parts.at(1).toInt();
        t.Col3 = Parts.at(2).toInt();
        t.Col4 = Parts.at(3).toInt();
        MyArray.append(t);
    } else { 
        // Malformed input, do something
    }
}

现在，您的数据已被解析并位于MyArray向量中。

正如user2617519所说，这可以通过多线程实现。我看到你正在阅读每一行并对其进行解析。把这些行放在一个队列中。然后让不同的线程将它们从队列中弹出，并将数据解析为结构
一种更简单的方法（没有多线程的复杂性）是将输入数据文件拆分为多个文件，并运行相同数量的进程来解析它们。随后可以合并数据。

QFile::readAll()可能会导致内存问题，而std::getline()速度较慢（::fgets()也是如此）。

我遇到了类似的问题，需要在QTableView中解析非常大的分隔文本文件。使用一个自定义模型，我解析了该文件，以找到每行开头的偏移量。然后，当需要在表中显示数据时，我读取该行并根据需要进行解析。这导致了大量的解析，但这实际上足够快，不会注意到滚动或更新速度的任何滞后。

它还具有低内存使用率的额外好处，因为我不会将文件内容读取到内存中。有了这种策略，几乎任何大小的文件都是可能的。

解析代码：

m_fp = ::fopen(path.c_str(), "rb"); // open in binary mode for faster parsing
if (m_fp != NULL)
{
  // read the file to get the row pointers
  char buf[BUF_SIZE+1];
  long pos = 0;
  m_data.push_back(RowData(pos));
  int nr = 0;
  while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
  {
    buf[nr] = 0; // null-terminate the last line of data
    // find new lines in the buffer
    char *c = buf;
    while ((c = ::strchr(c, 'n')) != NULL)
    {
      m_data.push_back(RowData(pos + c-buf+1));
      c++;
    }
    pos += nr;
  }
  // squeeze any extra memory not needed in the collection
  m_data.squeeze();
}

RowData和m_data是特定于我的实现的，但它们只是用于缓存文件中某行的信息（例如文件位置和列数）。

我采用的另一种性能策略是使用QByteArray来解析每一行，而不是QString。除非您需要unicode数据，否则这将节省时间和内存：

// optimized line reading procedure
QByteArray str;
char buf[BUF_SIZE+1];
::fseek(m_fp, rd.offset, SEEK_SET);
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
  buf[nr] = 0; // null-terminate the string
  // find new lines in the buffer
  char *c = ::strchr(buf, 'n');
  if (c != NULL)
  {
    *c = 0;
    str += buf;
    break;
  }
  str += buf;
}
return str.split(',');

如果需要用字符串而不是单个字符分割每一行，请使用::strtok()。