GzipOutputStream和GzipInputStream与协议缓冲区的简单工作示例

Simple working example of GzipOutputStream and GzipInputStream with Protocol Buffers

本文关键字:简单 工作 缓冲区 GzipInputStream 协议 GzipOutputStream      更新时间:2023-10-16

经过几天的协议缓冲试验后,我试图压缩文件。使用Python,这非常简单,不需要任何操作与流。

由于我们的大部分代码都是用c++编写的,所以我想压缩/用相同的语言解压缩文件。我已经尝试了boost gzip库,但无法让它工作(不压缩):

int writeEventCollection(HEP::MyProtoBufClass* protobuf, std::string filename, unsigned int compressionLevel) {
            ofstream file(filename.c_str(), ios_base::out | ios_base::binary);
            filtering_streambuf<output> out;
            out.push(gzip_compressor(compressionLevel));
            out.push(file);
            if (!protobuf->SerializeToOstream(&file)) {//serialising to wrong stream I asume
                    cerr << "Failed to write ProtoBuf." << endl;
                    return -1;
            }
            return 0;
    }

我已经搜索了使用GzipOutputStream和GzipInputStream与协议缓冲区,但无法找到一个工作的例子。

正如你可能注意到的,我现在最多是一个初学者我希望能有一个完整的例子http://code.google.com/apis/protocolbuffers/docs/cpptutorial.html(我有我的地址簿,我如何保存它在gzip文件?)

提前谢谢你。

编辑:工作示例。

在StackOverflow

上的答案后面的例子1
int writeEventCollection(shared_ptr<HEP::EventCollection> eCollection, 
std::string filename, unsigned int compressionLevel) { 
filtering_ostream out; 
out.push(gzip_compressor(compressionLevel)); 
out.push(file_sink(filename, ios_base::out | ios_base::binary)); 
if (!eCollection->SerializeToOstream(&out)) { 
                cerr << "Failed to write event collection." << endl; 
                return -1; 
} 
return 0; 
} 

在Google的Protobuf讨论组的回答后的例子2:

int writeEventCollection2(shared_ptr<HEP::EventCollection> 
eCollection, std::string filename, 
                        unsigned int compressionLevel) { 
using namespace google::protobuf::io; 
int filedescriptor = open(filename.c_str(), O_WRONLY | O_CREAT | O_TRUNC, 
                S_IREAD | S_IWRITE); 
if (filedescriptor == -1) { 
                        throw "open failed on output file"; 
                } 
google::protobuf::io::FileOutputStream file_stream(filedescriptor); 
GzipOutputStream::Options options; 
options.format = GzipOutputStream::GZIP; 
options.compression_level = compressionLevel; 
google::protobuf::io::GzipOutputStream gzip_stream(&file_stream, 
options); 
if (!eCollection->SerializeToZeroCopyStream(&gzip_stream)) { 
     cerr << "Failed to write event collection." << endl; 
     return -1; 
     } 
close(filedescriptor); 
return 0; 
} 

一些关于性能的评论(读取当前格式和写入ProtoBuf 11146文件):示例1:

real    13m1.185s 
user    11m18.500s 
sys     0m13.430s 
CPU usage: 65-70% 
Size of test sample: 4.2 GB (uncompressed 7.7 GB, our current compressed format: 7.7 GB)

示例2:

real    12m37.061s 
user    10m55.460s 
sys     0m11.900s 
CPU usage: 90-100% 
Size of test sample: 3.9 GB

似乎Google的方法更有效地使用CPU,稍微快一点(尽管我希望这在精度范围内),并且在相同的压缩设置下产生约7%的小数据集。

你的假设是正确的:你发布的代码不工作,因为你直接写到ofstream,而不是通过filtering_streambuf。要做到这一点,你可以使用filtering_ostream:

ofstream file(filename.c_str(), ios_base::out | ios_base::binary); 
filtering_ostream out; 
out.push(gzip_compressor(compressionLevel)); 
out.push(file);
if (!protobuf->SerializeToOstream(&out)) {
    // ... etc.
}

或者更简洁地说,使用file_sink:

filtering_ostream out; 
out.push(gzip_compressor(compressionLevel)); 
out.push(file_sink(filename, ios_base::out | ios_base::binary));
if (!protobuf->SerializeToOstream(&out)) {
    // ... etc.
}

我希望这对你有帮助!