解压缩串联的 zlib 流而不读取下一个字节

Decompress concatenated zlib streams without reading next bytes

本文关键字：读取下一个字节 zlib 解压缩更新时间：2023-10-16

我有一个无法修改的文件，它由 3 个连接的 zlib 数据组成。数据不是很大(几百KB( 我怎样才能阅读它们？有Qt函数qUncompress()(编辑(，但它需要一个长度作为参数，我不知道流的实际长度是多少。

解决方案 1：通过流读取数据时，我看到的用于执行该操作的代码会读取数据块，并在遇到错误时停止。问题是当"读取固定的数据块"将消耗该块时，如果流的大小 si 不完全是 N 的倍数，则流将被损坏。

伪代码：

while (no error) {
read N bytes
decompress_next(these N bytes)
}
... Here there may be up to N-1 totally skipped bytes...

当 N=1 时它有效，但我觉得有点黑客。有更好的选择吗？

解决方案 2：解压缩流，再次压缩并获取第一个块的大小。转到偏移量，然后读取等...(当输入流不可写时，它应该不适用，但它应该在我的情况下工作(

如果代码不是平凡的，我最终可以使用 c 或 c++ 库(理想情况下是轻量级的(。

这可能是不可能的，我对 zlib 算法了解不多，如果它知道流何时结束或只是读取"无状态"数据。

编辑：实用程序zlib-flate看起来像是为解决方案#2执行此操作，因此显然是可能的

你可以直接使用 zlib 库来完成这个任务。下面是一些示例代码：

#include <iostream>
#include <zlib.h>
#include <vector>
#include <algorithm>
#include <numeric>
#include <cassert>
#include <string>
#include <array>
std::vector<uint8_t> my_compress(std::string const& in_data)
{
auto bufsize = compressBound(in_data.size());
std::vector<uint8_t> outbuf(bufsize);
auto srcptr = reinterpret_cast<uint8_t const*>(in_data.data());
size_t destlen = outbuf.size();
auto result = compress(outbuf.data(), &destlen, srcptr, in_data.size());
assert(result == Z_OK);
outbuf.resize(destlen);
return outbuf;
}
std::vector<uint8_t> generate_concatenated_data()
{
std::vector<uint8_t> outbuf;
std::vector<std::string> strings =
{
"zlib is a widely-used library",
"remember to check your error returns",
"the quick brown fox jumps over the lazy dog",
"Never tell me the odds."
};
for(auto const& s: strings)
{
auto compressed = my_compress(s);
// append each compressed stream to the end of the buffer
outbuf.insert(end(outbuf), begin(compressed), end(compressed));
}
return outbuf;
}
void print_buffer(std::vector<char> const& buf)
{
std::cout << "buffer contains: ";
for(char c: buf)
{
std::cout << c;
}
std::cout << 'n';
}
int main()
{
std::vector<uint8_t> concat_data = generate_concatenated_data();

std::array<uint8_t, 1024> scratch = {}; //scratch buffer for decompressing the data. (size doesn't matter )

z_stream s{};
// standard zlib init procedure
s.zalloc = nullptr;
s.zfree = nullptr;
s.opaque = nullptr;
int init_result = inflateInit(&s);
// insert error checking here.
assert(init_result == Z_OK);

s.next_in = concat_data.data();
s.avail_in = concat_data.size();
while(s.avail_in > 0)
{
// output destination buffer
std::vector<char> out_data;

int inflate_ret = 0;
while(true)
{
s.next_out = scratch.data();
s.avail_out = scratch.size();
inflate_ret = inflate(&s, Z_NO_FLUSH);
//make sure we decoded right
assert(inflate_ret == Z_OK || inflate_ret == Z_STREAM_END);
auto bytes_decoded = scratch.size() - s.avail_out;

// there are definitely more efficient ways to append to a vector.
for(int i = 0; i < bytes_decoded; i++)
{
out_data.push_back(scratch[i]);
}
// is this stream done?
if(inflate_ret == Z_STREAM_END) break;
}
assert(inflate_ret == Z_STREAM_END);

// get ready for the next stream
auto reset_result = inflateReset(&s);
assert(reset_result == Z_OK);

// do something with the data.
print_buffer(out_data);
}

//cleanup
inflateEnd(&s);
std::cout << "end.n";
}

你将获得以下输出：

buffer contains: zlib is a widely-used library
buffer contains: remember to check your error returns
buffer contains: the quick brown fox jumps over the lazy dog
buffer contains: Never tell me the odds.
end.

zlib 库的文档可在此处获得：https://zlib.net/manual.html