获取Crypto++中Unicode字符串的SHA1

Get SHA1 of Unicode string in Crypto++

本文关键字：SHA1 字符串 Unicode Crypto++ 获取更新时间：2023-10-16

我独立学习C++，有一个问题我不能解决超过一周。我希望你能帮助我。

我需要获得Unicode字符串(如Привет)的SHA1摘要，但我不知道如何做到这一点。

我试着这样做，但它返回了错误的摘要！

对于wstring('Ы')它返回-A469A61DF29A7568A6CC63318EA8741FA1CF2A7
我需要-8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

对我的英语表示问候和歉意：)。

CryptoPP 5.6.2MVC++2013

#include <iostream>
#include "cryptopp562cryptlib.h"
#include "cryptopp562sha.h"
#include "cryptopp562hex.h"
int main() {
std::wstring string(L"Ы");
int bs_size = (int)string.length() * sizeof(wchar_t);
byte* bytes_string = new byte[bs_size];
int n = 0; //real bytes count
for (int i = 0; i < string.length(); i++) {
wchar_t wcharacter = string[i];
int high_byte = wcharacter & 0xFF00;
high_byte = high_byte >> 8;
int low_byte = wcharacter & 0xFF;
if (high_byte != 0) {
bytes_string[n++] = (byte)high_byte;
}
bytes_string[n++] = (byte)low_byte;
}
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(bytes_string, n, true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
) 
) 
);
std::cout << hash << std::endl;
return 0;
}

我需要获得Unicode字符串的SHA1摘要(如Привер)，但我不知道如何做到这一点。

这里的诀窍是你需要知道如何对Unicode字符串进行编码。在Windows上，wchar_t是2个八位字节；而在Linux上CCD_ 6是4个保护。在字符集注意事项中有一个Crypto++wiki页面，但不是很好。

为了最有效地进行互操作，请始终使用UTF-8。这意味着您可以将UTF-16或UTF-32转换为UTF-8。因为您使用的是Windows，所以需要调用WideCharToMultiByte函数来使用CP_UTF8进行转换。如果您使用的是Linux，那么您将使用libiconv。

Crypto++有一个名为StringNarrow的内置函数，它使用C++。它在文件misc.h中。使用前请务必致电setlocale。

Stack Overflow在使用Windows函数时有几个问题。例如，请参见如何正确使用WideCharToMultiByte。

我需要-8dbe718ab1e0c4d75f7ab50fc9a53ec4f0528373

什么是哈希(SHA-1、SHA-256…)？它是HMAC(密钥散列)吗？信息是否经过了处理(就像存储中的密码一样)？它是如何编码的？我不得不问，因为我无法复制您想要的结果：

SHA-1:   2805AE8E7E12F182135F92FB90843BB1080D3BE8
SHA-224: 891CFB544EB6F3C212190705F7229D91DB6CECD4718EA65E0FA1B112
SHA-256: DD679C0B9FD408A04148AA7D30C9DF393F67B7227F65693FFFE0ED6D0F0ADE59
SHA-384: 0D83489095F455E4EF5186F2B071AB28E0D06132ABC9050B683DA28A463697AD
1195FF77F050F20AFBD3D5101DF18C0D
SHA-512: 0F9F88EE4FA40D2135F98B839F601F227B4710F00C8BC48FDE78FF3333BD17E4
1D80AF9FE6FD68515A5F5F91E83E87DE3C33F899661066B638DB505C9CC0153D

这是我使用的程序。请确保指定宽字符串的长度。如果不这样做(并使用-1作为长度)，则WideCharToMultiByte将在其计算中包括终止ASCII-Z。由于我们使用的是std::string，因此不需要函数包含ASCII-Z终止符。

int main(int argc, char* argv[])
{
wstring m1 = L"Привет"; string m2;
int req = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), NULL, 0, NULL, NULL);
if(req < 0 || req == 0)
throw runtime_error("Failed to convert string");
m2.resize((size_t)req);
int cch = WideCharToMultiByte(CP_UTF8, 0, m1.c_str(), (int)m1.length(), &m2[0], (int)m2.length(), NULL, NULL);
if(cch < 0 || cch == 0)
throw runtime_error("Failed to convert string");
// Should not be required
m2.resize((size_t)cch);
string s1, s2, s3, s4, s5;
SHA1 sha1; SHA224 sha224; SHA256 sha256; SHA384 sha384; SHA512 sha512;
HashFilter f1(sha1, new HexEncoder(new StringSink(s1)));
HashFilter f2(sha224, new HexEncoder(new StringSink(s2)));
HashFilter f3(sha256, new HexEncoder(new StringSink(s3)));
HashFilter f4(sha384, new HexEncoder(new StringSink(s4)));
HashFilter f5(sha512, new HexEncoder(new StringSink(s5)));
ChannelSwitch cs;
cs.AddDefaultRoute(f1);
cs.AddDefaultRoute(f2);
cs.AddDefaultRoute(f3);
cs.AddDefaultRoute(f4);
cs.AddDefaultRoute(f5);
StringSource ss(m2, true /*pumpAll*/, new Redirector(cs));
cout << "SHA-1:   " << s1 << endl;
cout << "SHA-224: " << s2 << endl;
cout << "SHA-256: " << s3 << endl;
cout << "SHA-384: " << s4 << endl;
cout << "SHA-512: " << s5 << endl;
return 0;
}

您说"但它返回了错误的摘要"–您将其与什么进行比较？

关键点：像SHA-1这样的摘要不适用于字符序列，而是适用于字节的序列。

在这段代码中，您要做的是生成字符串"Ы"中unicode字符的特殊编码。如果字符串中的字符都在BMP("基本多语言平面"，在这种情况下是正确的)中，并且如果最终在wcharacter中的数字是表示unicode代码点的整数(这可能是正确的，但我认为不能保证)，则该编码将(事实证明)与UTF-16编码匹配。

如果与之比较的摘要使用UTF-8编码将输入字符串转换为字节序列(这很可能)，那么这将产生与您的字节序列不同的字节序列，因此该序列的SHA-1摘要将与您在此处计算的摘要不同。

因此：

检查测试字符串使用的编码方式。
最好使用一些库函数专门为要处理的字符串生成UTF-16或UTF-8(视情况而定)编码，以确保您使用的字节序列是您认为的。

在命名恰当的文档the Absolute Minimum Every Software Developer Absolute，Positive Must Know About unicode and Character Sets(No Excuses！)中有一篇关于unicode和编码的精彩介绍

这对我来说似乎很好。

与其费力地提取片段，我只是将宽字符缓冲区转换为const byte*，并将其(以及调整后的大小)传递给哈希函数。

int main() {
std::wstring string(L"Привет");
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(
reinterpret_cast<const byte*>(string.c_str()), // cast to const byte*
string.size() * sizeof(std::wstring::value_type), // adjust for size
true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
)
)
);
std::cout << hash << std::endl;
return 0;
}

输出：

C6F8291E68E478DD5BD1BC2EC2A7B7FC0CEE1420

编辑：添加。

结果将取决于encoding。例如，我在Linux上运行了这个，其中wchar_t是4个字节。在Windows上，我相信wchar_t可能只有2个字节。

为了一致性，最好使用UTF8将文本存储在正常的std::string中。这也使得调用API更加简单：

int main() {
std::string string("Привет"); // UTF-8 encoded
CryptoPP::SHA1 sha1;
std::string hash;
CryptoPP::StringSource ss(
string,
true,
new CryptoPP::HashFilter(sha1,
new CryptoPP::HexEncoder(
new CryptoPP::StringSink(hash)
)
)
);
std::cout << hash << std::endl;
return 0;
}

输出：

2805AE8E7E12F182135F92FB90843BB1080D3BE8