C++ std::bad_alloc 加载 190 万行浮点值文件 - 提供源代码

c++ std::bad_alloc on loading 1.9 million lines file of floating point values - source code provided

本文关键字:文件 万行 源代码 bad std alloc 加载 C++      更新时间:2023-10-16

所以我有一个作业,我读取了 190 万行浮点值(每行 360 个浮点),我需要对这些数据进行操作。最初我有这个工作,我不确定为什么我今天要得到bad_alloc。这是我的代码。我想知道我是否可以做些什么来更好地优化它 - 我正在使用向量,我希望我不必使用结构数组。我可以简单地创建一个结构并创建一个结构数组,每个结构将保存 x,y 和一个浮点值数组。这会有很大的不同吗?

我非常感谢对我的实现和代码的任何批评。谢谢!

实现

#include "file_parser.hpp"
Parser::Parser(char* fname){
    fileName = fname;
}
/*
*   The function parses a file that was set in the constructor
*   And returns a map of the file
*/
VectorsMap Parser::parseFile(){
    //open file
    fPtr = fopen(fileName, "r");
    total_rows = 0;
    line = (char*)malloc(sizeof(char)*LINE_MAX);
    //parse the file line by line
    while(fgets(line, LINE_MAX, fPtr)){
        //make sure that we do not read an empty line
        if(line[0] != '') {
            //send the line to be further parsed
            parseLine(line);
            //increment total rows count
            total_rows++;   
        }
    }
    return vector_points;
}
void Parser::doCleanUp(){
    fclose(fPtr);
    free(line);
    vector_points.clear();
}
/**
*   Parse a line and tokenize it
*   while extracting X and Y points
*   and vectors and put them in a VectorsMap(deifned in file_parser.h)
*/
void Parser::parseLine(char* line){
    //collection of vectors.
    std::vector<float> vectors;
    char* point;
    //grab the x and y tokens
    char* tk1 = strtok(line, ",");
    char* tk2 = strtok(NULL, ",");
    //value for indexing
    int i=0;
    char* tmp;
    //make sure we have two correct x and y points
    if(tk1 == NULL || tk2 == NULL){ return; }
    //convert the tokens to floats
    float x = strtof(tk1, NULL);
    float y = strtof(tk2, NULL);
    //create the x and y pair used to insert vectors into the map
    XYPair pair = XYPair(x, y);
    //tokenize until end of line
    while(point=strtok(NULL, ",")){
        //convert the token to float
        float f_point = strtof(point, NULL);
        //push the float to the vector
        vectors.push_back(f_point);
        i++;
    }
    //insert in the vectormap.
    vector_points.insert(VectorsPair(pair, vectors));
}
int Parser::getTotalRows(){
    return total_rows;
}

头文件:

//create specific types to make my life easier later on 
typedef std::pair<float, float> XYPair;
typedef std::pair<XYPair, std::vector<float> > VectorsPair;
typedef std::map<XYPair, std::vector<float> > VectorsMap;
class Parser{       
    public:
        //constructor
        Parser(char* fname);
        VectorsMap parseFile();
        int getTotalRows();
        int row_values;
        int total_rows;
        void doCleanUp();
    private:
        //collection of all x y points and their vectors 
        VectorsMap vector_points;
        FILE* fPtr; //file pointer to file to be parsed
        char *line; //line to parse file line by line
        char* fileName; //path/name of file to be parsed
        void parseLine(char* line);
};

调用malloc并在调用parseFile时调用fopen,然后在单独的函数中释放是非常容易出错的。如果在未调用doCleanup的情况下调用parseFile两次,则会泄漏内存和文件句柄。

我会停止使用mallocstrtok.

VectorsMap Parser::parseFile(){
    //open file
    std::ifstream f(fileName);
    total_rows = 0;
    std::string line;
    //parse the file line by line
    while(std::getline(f, line)){
        //make sure that we do not read an empty line
        if(line.size()) {
            //send the line to be further parsed
            parseLine(line);
            //increment total rows count
            total_rows++;   
        }
    }
    return vector_points;
}

然后重写parseLine以不使用可怕的strtok函数,例如使用Boost.Tokenizer或std::istringstreamstd::getline

另请注意,您将数据读入vector_points然后返回副本,这意味着您需要两倍于数据集使用的内存。您只能通过执行以下操作来保留数据的一个副本:

return std::move(vector_points);

因此,数据被移动到返回值中,而不是复制。