计算 c# 中二进制文件符号的频率不起作用,但适用于等效的 c++ 代码

Calculating frequency of symbols of binary file in c# not working but working for equivalent c++ code

本文关键字:适用于 代码 c++ 不起作用 二进制文件 符号 频率 计算      更新时间:2023-10-16

我正在尝试以c#计算二进制文件中符号的频率,我已经在 c++ 中完成了并且工作正常,我已经从 c++ 切换到 c#,因为我必须在 C# 中实现相同的内容。

注意:我不必使用 LUT/数组,只需使用链接列表。

我的意思是frequency符号的重复次数,symbols的意思是,如果我们看到使用xxd -b BinaryFile.bin的二进制文件,那么我们将得到很多01组合8 bits。因此,每个符号重复的次数就是它的频率。

现在,我是如何尝试这样做的:我通过在 notepad++ 中用文件名编写的代码在终端上执行mono filename.exe BinaryFile.bin来实现c#

逻辑:我读取二进制文件中的每个符号,如果它不重复,那么我把它添加到链表的尾部,如果它重复,那么我增加它的频率。对于完整的二进制文件,我重复此操作。

法典:

c#代码(完整):(无法正常工作,我已经指出了代码中包含部分的问题,我放置了完整的代码,因为您可能需要它):

////Problem containing part starts here ////////
public Huffman(string[] args) //called from MyClass 
{
Node tree = null;
int counter = 0;
using(var stream = new BinaryReader(System.IO.File.OpenRead(args[0]))) 
{
while (stream.BaseStream.Position < stream.BaseStream.Length) 
{
int processingValue = stream.ReadByte();
Node ppt, pt, temp;
bool is_there = false;
ppt = pt = tree;
while (pt != null) 
{
if (pt.symbol == processingValue) 
{
pt.freq++;
is_there = true;
break;
}
ppt = pt;
pt = pt.next;
}
if (is_there == false) 
{
temp = new Node();
temp.symbol = processingValue;
temp.freq = 1;
temp.left = null;
temp.right = null;
temp.next = null;
temp.id = (++total_nodes);
temp.is_processed = 0;
if (tree == null) 
{
tree = temp;
} 
else 
{
ppt.next = temp;
}
//The same check/debugging which i was doing in c++ to know what symbol and freq contains but they contains different values. 
//And the output of both c#/c++ are different where as it was supposed to be same.
Node chc = tree;
while (chc != null) 
{
Console.WriteLine("  sym: " + chc.symbol);
Console.WriteLine("  freq: " + chc.freq);
chc = chc.next;
}
}
}
stream.Close();
}

}
////Problem containing part Ends here ////////

c++ 和 c# 输出之间的区别:

(1) 当我显示 c++ 的输出时,它可以正常工作,当我看到我在代码中编写的代码部分以调试/检查终端上的输出时,它显示了代码的正确执行。 而 C# 中的相同调试代码不会显示与 C++ 相同的输出。 应该这样做,因为保留了打印"符号"和"频率"的这两个代码在程序中的同一位置。

(2)c#的输出让我觉得while循环在c#中的执行次数比在C ++中少,因为在C#输出终端中没有显示大量的频率和符号重复。请参阅两者的输出:

终端上的 C# 输出:

hp@ubuntu:~/Desktop/$ mono check1.exe out.bin 
sym: 0
freq: 1
sym: 0
freq: 200
sym: 1
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 195
sym: 3
freq: 1
sym: 0
freq: 200
sym: 1
freq: 198
sym: 2
freq: 195
sym: 3
freq: 189
sym: 4
freq: 1
hp@ubuntu:~/Desktop/

c++的输出是:(这里计数器的值实际上从"0"(不是"196")开始,但无法显示完整的输出,因为文件更大,所以输出很大终端无法显示全部,它只是在最后显示输出)

hp@ubuntu:~/Desktop/$ ./filename out.bin
//Counter starts from "0" but terminal is not able to show all.So doing from "196"
counter: 196
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  1
counter: 197
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  2
counter: 198
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  3
counter: 199
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  4
counter: 200
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  5
counter: 201
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  6
counter: 202
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  7
counter: 203
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  8
counter: 204
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  9
counter: 205
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  10
counter: 206
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  11
counter: 207
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  12
counter: 208
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  13
counter: 209
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  14
counter: 210
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  15
counter: 211
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  16
counter: 212
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  17
counter: 213
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  18
counter: 214
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  19
counter: 215
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  20
counter: 216
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  21
counter: 217
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  22
counter: 218
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  23
counter: 219
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  24
counter: 220
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  25
counter: 221
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  26
counter: 222
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  27
counter: 223
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  28
counter: 224
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  29
counter: 225
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  30
counter: 226
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  31
counter: 227
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  32
counter: 228
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  33
counter: 229
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  34
counter: 230
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  35
counter: 231
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  36
counter: 232
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  37
counter: 233
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  38
counter: 234
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  39
counter: 235
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  40
counter: 236
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  41
counter: 237
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  42
counter: 238
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  43
counter: 239
sym:  0
freq:  50
sym:  1
freq:  50
sym:  2
freq:  48
sym:  3
freq:  48
sym:  4
freq:  44

问题:

(1)为什么C#代码显示的输出与C++不同?甚至我也尝过同一个二进制文件(在我的情况下.bin)。

(2)你能帮我摆脱这个问题吗?非常感谢

我可能完全误解了这一点...您只是想知道指定文件中每个字节值 (0..255) 的出现次数吗?

如果是这样,一个简单的方法是这样的:

var counts = new int[256];  // Assumes files aren't longer than 2GB.
string filename = "<Your filename goes here>";
foreach (byte b in File.ReadAllBytes(filename)) // Will run out of memory
++counts[b];                                // for very large files!
for (int i = 0; i < counts.Length; ++i)
Console.WriteLine("Symbol {0} occurred {1} times.", i, counts[i]);

然而,这比你所做的要简单得多,我觉得我一定是误会了......


[编辑]

我无法修复您的原始代码,但这里有一个使用链接列表解决问题的示例程序:

using System;
using System.IO;
namespace ConsoleApp1
{
public sealed class Node
{
public byte Symbol { get; set; }
public int Count   { get; set; }
public Node Next   { get; set; }
}
sealed class Program
{
private void run()
{
var linkedList = new Node();
string filename = @"C:Testt.cs";
foreach (byte symbol in File.ReadAllBytes(filename))
addSymbol(symbol, linkedList);
for (int symbol = 0; symbol < 256; ++symbol)
{
int count = countForSymbol((byte)symbol, linkedList);
Console.WriteLine("Symbol {0} occurred {1} times.", symbol, count);
}
}
private static void addSymbol(byte symbol, Node head)
{
Node last = head;
while (head != null)
{
last = head;
if (head.Symbol == symbol)
{
++head.Count;
return;
}
else
{
head = head.Next;
}
}
last.Next = new Node
{
Symbol = symbol, 
Count = 1
};
}
private int countForSymbol(byte symbol, Node head)
{
while (head != null)
{
if (head.Symbol == symbol)
return head.Count;
else
head = head.Next;
}
return 0;
}
private static void Main()
{
new Program().run();
}
}
}