从二进制文件读取/写入十六进制字节

Reading/Writing Hex Bytes from/to a Binary File

本文关键字：十六进制字节二进制文件读取更新时间：2023-10-16

我需要以二进制模式读取文件，并将字节作为十六进制值存储在任何 STL 容器中（最好是 std：：list）。稍后我需要将它们写回文件，也是在二进制模式下。所以，我宣布，

typedef unsigned char BYTE;
std::ifstream File("File_Name", std::ios::binary);
std::list<BYTE> File_Bytes;

通过所有的搜索，我明白了几件事。阅读可以用std：：istream：：read（）或std：：istreambuf_iterator来完成（我可能大错特错。请纠正我。read（）函数仅将 char* 作为内存中存储的字节和输入流大小的参数。

当我必须将文件中的字节读取到 BYTE 列表中，并分别使用 istream 和 ostream 再次从 BYTE 列表写入文件时，我将如何执行此操作？请为我澄清这一点。谢谢。

注意：这实际上是针对霍夫曼编码器/解码器，我需要在程序内部压缩和解压缩，并将解压缩的位写入输出文件。这是为了验证压缩的无损性和程序的正确性。此外，谁能告诉我如何将编码的二进制位写入文件，以及编码的霍夫曼文件将具有什么文件扩展名？谢谢。

正如注释所阐明的，您希望将二进制文件的字节加载到一些 STL 容器的char - 或者更准确地说，uint8_t - 和将此类容器保存回二进制文件。

有很多方法可以做到这一点，包括你已经发现的，使用std::basic_istream::read 和std::basic_ostream::write ，或std::istream_iterator 和std::ostream_iterator.

后一种方法生成最简单的代码。fread/fwrite方法产生最快的代码，但对于显然是越简单越好将仅仅是程序的序幕和尾声操作。

下面是一对匹配的模板函数，它们将分别：

返回参数类型为 Container 的 STL 容器，填充输入文件的字节序列。

将参数类型为 Container 的 STL 容器的元素复制到输出文件中的字节序列。

#include <fstream>
#include <iterator>
#include <algorithm>
#include <stdexcept>
#include <cstdint>
template<class Container>
Container binary_load(std::string const & bin_file_name)
{
    std::ifstream in(bin_file_name,std::ios::binary);
    if (!in) {
        throw std::runtime_error("Could not open "" + bin_file_name + 
            "" for reading");
    }
    std::noskipws(in); // PON 1
    return Container(std::istream_iterator<std::uint8_t>(in),
                        std::istream_iterator<std::uint8_t>()); //PON 2
}
template<class Container>
void binary_save(Container && data, std::string const & bin_file_name)
{
    std::ofstream out(bin_file_name,std::ios::binary);
    if (!out) {
        throw std::runtime_error("Could not open "" + bin_file_name + 
            "" for writing");
    }
    std::copy(data.begin(),data.end(),
        std::ostream_iterator<std::uint8_t>(out,"")); // PON 3  
}

要编译基本用例，请附加以下内容：

#include <vector>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
    string infile = argv[1];
    string outfile = infile + ".saved";
    auto data(binary_load<vector<std::uint8_t>>(infile));
    binary_save(data,outfile);
    return 0;
}

这将编译为 C++11 或更高版本。生成的程序加载您指定为其第一个的文件命令行参数进入std::vector<std::uint8_t>，然后只是将该向量保存到同名文件中，并带有附加的扩展.saved .当然，您的程序将加载一个向量并保存另一个。

注意事项：

此语句是必需的，以通知流in它应该提取所有字节，而不是跳过空格字节。
此语句直接从[begin,end)构造填充的Container迭代器范围，以可以构造每个 STL 容器的方式。begin迭代器std::istream_iterator<char>(in)是流的开始in的迭代器，以及end迭代器std::istream_iterator<char>()是每个流的流结束迭代器。
此语句将字节序列复制到 std::ostream_iterator<char>最初定位在out开始时。迭代器构造函数的""参数通知它空字符串（即无）应分隔连续的输出字节。

这些函数模板的通用性比严格意义上的要多一些需要：

用于调用binary_load Container类型不必是uint8_t的容器，甚至是相同大小的容器。它需要仅是可以从迭代器范围构造的容器类型uint8_t顺序 .
同样，您调用binary_save需要的Container类型仅是其元素属于隐式E类型可转换为uint8_t，但需要注意会发生截断如果您任性地选择保存任何在uint8_t中无法表示的E。

因此，将这些放在一起，不会造成任何伤害，例如，如果您在示例程序中将vector<uint8_t>替换为vector<long>。

当然，如果您错误地调用了任一函数模板，并使用不满足模板要求的容器类型 Container，代码将无法编译。

继续征求OP的评论

我可以使用无符号字符代替 [uint8_t]吗？

是的，uint8_t几乎不可避免地被定义为unsigned char编译器和任何 8 位类型的整型都可以。 uint8_t只是最清楚地说"字节"。如果你愿意进一步参数化模板函数相对于"字节"键入，您可以像这样操作：

...
#include <type_traits>
template<class Container, typename Byte = std::uint8_t>
Container binary_load(std::string const & bin_file_name) {
    static_assert(sizeof(Byte) == 1,"Size of `Byte` must be 1");
    // `std::uint8_t` becomes `Byte` 
    ...
}
template<class Container, typename Byte = std::uint8_t>
void binary_save(Container && data, std::string const & bin_file_name) {
    static_assert(sizeof(Byte) == 1,"Size of `Byte` must be 1");
    // `std::uint8_t` becomes `Byte` 
    ...
}

关于霍夫曼编码文件的正确文件扩展名，没有事实上的标准。选择你喜欢的。

除非您需要使用 MS VC10（支持 C++11）对于您的控制台版本，无需。冲击最新的 GCC 工具链可免费用于 Windows，并且支持IDE：CodeLite，Code：：Blocks

我建议使用固定大小的缓冲区uint8_t：

const unsigned int BUFFER_SIZE = 1024*1024;  
uint8_t buffer[BUFFER_SIZE];
// ...
my_file.read((char *)buffer, BUFFER_SIZE);

在程序中，您将读取缓冲区，对其进行处理，然后从输入中读取另一个缓冲区。

对于您的目的，数组是比std::vector或std::list更有效的容器。

此外，请使用uint8_t因为它是标准化类型。

你的问题涉及两个非常不同的主题。

如何读取和写入二进制文件
如何操作二进制数据。（充气/放气）

文件 IO

读取文件的两种流行方法是 read() 和 getline() .我在处理二进制文件时使用read()，在每行读取文本文件时使用getline()。由于您正在处理二进制数据，因此我建议使用read()。

// Open Binary file at the end
std::ifstream input(filePath, std::ios::ate | std::ios::binary);
assert(input.is_open());
// Calculate size
size_t end = input.tellg();
input.seekg(0,std::ios::beg);
size_t beg = input.tellg();
size_t len = end - beg;
assert(len > 0);
// Read in Binary data
std::vector<char> binaryData(len);
input.read(&(binaryData[0]),len);
// Close
input.close();

在游戏的这个阶段，您将所有二进制数据存储在一个向量中。我知道在您的示例中，您已经使用list来表达，但是考虑到您想处理连续的字节流，vector似乎更符合您正在做的事情。

二元的

有几种方法可以处理二进制数据。您可以使用可靠的班次运算符<<和>>以及一些良好且&和/或|逻辑。但是，如果您想在代码中更直观地表示，我建议您研究std::bitset。

使用位集，您可以轻松地将vector的内容加载到二进制的 8 位表示形式中。

std::bitset<8>  deflatedBinary(binaryData[0]);
std::bitset<12> inflatedBinary;

第一个bitset保存第一个char的 8 位二进制表示形式，第二个集合 inflatedBinary，有 12 位全部归零。从这里您可以通过索引[]访问它们的元素。您可以在此处阅读有关std::bitset的更多信息。