霍夫曼解码压缩文件

Huffman Decoding Compressed File

本文关键字：文件压缩解码霍夫曼更新时间：2023-10-16

我有一个程序，该程序根据文本输入文件中的ASCII字符频率读取了Huffman树。Huffman代码存储在256个元素的字符串阵列中，如果字符未读取，则将其空字符串存储在空字符串中。该程序还编码并压缩输出文件。

我现在正在尝试解压缩并解码我当前的输出文件，该文件以输入文件打开，新的输出文件是要使解码的消息与原始文本输入文件相同。

我分配的这一部分的思考过程是从我制作的编码函数中向后工作，并一次读取8位，并以某种方式通过更新变量（字符串n）来解码该消息，该变量（字符串n）首先是一个空字符串，通过huffman树的递归，直到我得到一个代码输出以输出文件为止。

我目前已经启动了该功能，但是我被卡住了，我正在寻找一些指导，以编写当前的DecoDeoTup函数。所有的帮助都得到赞赏。
我完成的编码图功能和DecOdeOutput函数在下面：

（对于编码函数，文件名是输入文件参数，fileName2是输出文件参数）

（对于DecoDeOutput函数，文件名是输入文件参数，文件名3是输出文件参数）

代码[256]是这两个功能的参数" 111"存储在代码[72]时，将其传递给函数时。

void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
    ifstream ifile;//to read file
    ifile.open(fileName, ios::binary);
    if (!ifile) //to check if file is open or not
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName2, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    int read;
    read = ifile.get();//read one char from file and store it in int
    char buffer = 0, bit_count = 0;
    while (read != -1) {
        for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
            buffer <<= 1;
            buffer |= code[read][b] != '0'; 
            bit_count++;
            if (bit_count == 8) {
                ofile << buffer;
                buffer = 0;
                bit_count = 0;
            }
        }
        read = ifile.get();
    }
    if (bit_count != 0)
        ofile << (buffer << (8 - bit_count));
    ifile.close();
    ofile.close();
}
//Work in progress
void decodeOutput(const string & fileName2, const string & fileName3, string code[256]) {
    ifstream ifile;
    ifile.open(fileName2, ios::binary);
    if (!ifile)
    {
        die("Can't read again");
    }
    ofstream ofile;
    ofile.open(fileName3, ios::binary);
    if (!ofile) {
        die("Can't open encoding output file");
    }
    string n = ""; 
    for (int c; (c = ifile.get()) != EOF;) {
        for (unsigned p = 8; p--;) {
            if ((c >> p & 1) == '0') { // if bit is a 0
            }
            else if ((c >> p & 1) == '1') { // if bit is a 1
            }
            else { // Output string n (decoded character) to output file
              ofile << n;
            }
        }
    }
}

如果用原始的霍夫曼树来构造代码簿，解码会更容易。但是，假设您只有代码簿（即string code[256]），而不是原始的霍夫曼树。您能做的就是以下内容：

将代码簿分为不同长度的代码字组。说代码本由具有不同长度的代码字组成：l 0 ＆lt;l ₁＆lt;...＆lt;l _n-1。
从输入文件中读取（但尚未消耗）k位，k从l ₀增加到l _n-1，直到找到匹配对于某些I。
输出与匹配代码字相对应的8位字符，并从输入文件中消耗k位。
重复直到消耗输入文件的所有位。

如果正确构建了代码簿，并且您始终以增加长度查找代码字，则永远不应找到一系列输入位，找不到匹配的代码字。

有效地，就霍夫曼树等效性而言，每当您将k输入位与一组长度K的代码字比较时，您都在检查树级别的叶子是否包含输入匹配的codeWord；每当您将k增加到下一个更长的组编码组时，您都会从树上走到更高的水平（例如，级别为0是根）。