在C++中从 CSV 获取数据的最快方法

Fastest way to get data from a CSV in C++

本文关键字：方法数据获取 C++ 中从 CSV 更新时间：2023-10-16

>我有一个这样的大型CSV（大约75 MB）：

1,2,4
5,2,0
1,6,3
8,3,1
...

我用以下代码存储我的数据：

#include <sstream>
#include <fstream>
#include <vector>
int main()
{
    char c; // to eat the commas
    int x, y, z;
    std::vector<int> xv, yv, zv;
    std::ifstream file("data.csv");
    std::string line;
    while (std::getline(file, line)) {
        std::istringstream ss(line);
        ss >> x >> c >> y >> c >> z;
        xv.push_back(x);
        yv.push_back(y);
        zv.push_back(z);
    }
    return 0;
}

它在这个大的CSV（~75MB）中吸引了我：

real        0m7.389s
user        0m7.232s
sys         0m0.132s

这可是太多了！

最近，使用Sublime Text片段，我找到了另一种读取文件的方法：

#include <iostream>
#include <vector>
#include <cstdio>
int main()
{
    std::vector<char> v;
    if (FILE *fp = fopen("data.csv", "r")) {
        char buf[1024];
        while (size_t len = fread(buf, 1, sizeof(buf), fp))
            v.insert(v.end(), buf, buf + len);
        fclose(fp);
    }
}

它在这个大的CSV（~75MB）中花了我（没有获取数据）：

real        0m0.118s
user        0m0.036s
sys         0m0.080s

这是时间上的巨大差异！

问题是我如何在字符向量中以更快的方式获取 3 个向量中的数据！我不知道我怎样才能以比第一个提议更快的方式做。

谢谢！^^

当然，您的第二个版本会快得多 - 它只是将文件读入内存，而不解析其中的值。与使用 C 样式 I/O 的第一个版本等效的公式如下：

if (FILE *fp = fopen("data.csv", "r")) {
    while (fscanf(fp, "%d,%d,%d", &x, &y, &z) == 3) {
        xv.push_back(x);
        yv.push_back(y);
        zv.push_back(z);
    }
    fclose(fp);
}

对我来说，这比C++式版本快三倍。但是没有中间stringstream的C++版本

while (file >> x >> c >> y >> c >> z) {
    xv.push_back(x);
    yv.push_back(y);
    zv.push_back(z);
}

几乎一样快。

保存在文件中，里面写了多少个数字。然后，在加载时调整矢量的大小。它可以减少一点时间。