将std::bitset写入二进制文件，然后将文件加载到std:bitset

write std::bitset to binary file and load the file to std:bitset

本文关键字：bitset std 文件加载然后二进制文件更新时间：2023-10-16

我正在处理一个将使用位集的项目。由于提供的文本文件非常大（>800M），将其直接加载到std:：bitset将花费超过25秒的时间。所以我想将文本文件预处理为内存转储的二进制文件。因为具有8位的字符将转换为1位，所以文件加载的成本时间将大大减少。我写了一个演示代码：

#include <iostream>      
#include <bitset>         
#include <string>
#include <stdexcept>      
#include <fstream>
#include <math.h> 
int main () {
    const int MAX_SIZE = 19;
    try {
        std::string line = "1001111010011101011";
        int copy_bypes = (int)ceil((float)MAX_SIZE / 8.0);

        std::bitset<MAX_SIZE>* foo = new (std::nothrow)std::bitset<MAX_SIZE>(line);     // foo: 0000
        std::ofstream os ("data.dat", std::ios::binary);
        os.write((const char*)&foo, copy_bypes);
        os.close();

        std::bitset<MAX_SIZE>* foo2 = new (std::nothrow)std::bitset<MAX_SIZE>();
        std::ifstream input("data.dat",std::ios::binary);
        input.read((char*)&foo2, copy_bypes);
        input.close();
        for (int i = foo2->size() -1 ; i >=0 ; --i) {
            std::cout  << (*foo2)[i];
        }
        std::cout <<std::endl;
    }
    catch (const std::invalid_argument& ia) {
        std::cerr << "Invalid argument: " << ia.what() << 'n';
    }
    return 0;
}

它看起来很好用，但我担心这种用法在生产环境中真的很好用。

提前感谢。

将二进制非三进制类写入文件真的很危险。您应该将位集转换为定义良好的二进制数据。如果您知道您的数据将适合无符号长整型，则可以使用位集<>：to_ullong（），并写入/读取该无符号长整型。如果你想让它是跨平台的，例如64位和32位平台，你应该使用固定尺寸的类型。

这两行错误

os.write((const char*)&foo, copy_bypes);
input.read((char*)&foo2, copy_bypes);

您将指针的地址传递给foo2，而不是std::bitset对象本身。但即使它被纠正了：

os.write((const char*)foo, copy_bypes);
input.read((char*)foo2, copy_bypes);

在生产环境中使用是不安全的。这里假设std::bitset是一个POD类型，并以此方式访问它。然而，当你的代码变得更加复杂时，你就有写或读太多的风险，而且没有任何保护措施可以阻止未定义的行为发生。std::bitset是方便的，而不是快速的，它是通过它提供的访问位的方法来表达的——没有合适的方法来获得它的存储器地址，例如std::vector或std::string提供的地址。如果您需要性能，则需要自己实现。