将Unicode字符转换为wstring

Unicode char to wstring

本文关键字：wstring 转换字符 Unicode 更新时间：2023-10-16

我正在尝试将C#字符串发送到C++wstring数据，反之亦然。(通过TCP(。

我成功地从C#(如Unicode、UTF-16(发送了字符串数据，并在C++中通过char数组将其输入。

但我不知道如何将char数组转换为wstring。

这就是当c++用utf-16 获得"abcd"时的样子

    [0] 97 'a'  char
    [1] 0 ''  char
    [2] 98 'b'  char
    [3] 0 ''  char
    [4] 99 'c'  char
    [5] 0 ''  char
    [6] 100 'd' char
    [7] 0 ''  char

这就是当c++获得"한글"使用utf-16

    [0] 92 '' char
    [1] -43 '?' char
    [2] 0 ''  char
    [3] -82 '?' char

这就是当c++"日本語"使用utf-16

    [0] -27 '?' char
    [1] 101 'e' char
    [2] 44 ','  char
    [3] 103 'g' char
    [4] -98 '?' char
    [5] -118 '?'char

由于UTF-8不支持所有的日语字符，我尝试通过UTF-16(C#字符串基本上使用的(获取数据。但是，通过使用我发现的所有方法，我都未能将这些char数组转换为wstring。

这是我在之前尝试的

std::wstring_convert<std::codecvt_utf16<wchar_t>> myconv 
 -> what wchar have to have
        [0] 54620 '한'   wchar_t
        [1] 44544 '글'   wchar_t
 ->What it have after using this 
    [0] 23765 '峕'   wchar_t
    [1] 174 '®' wchar_t

/

std::wstring wsTmp(s.begin(), s.end()); 
 -> what wchar have to have
            [0] 54620 '한'   wchar_t
            [1] 44544 '글'   wchar_t
->What it have after using this 
        [0] 92 '' wchar_t
        [1] 65493 'ￕ'   wchar_t
        [2] 0 ''  wchar_t
        [3] 65454 'ﾮ'   wchar_t

在这两种情况下，我都将char数组更改为字符串，并将其更改为wstring结果失败了。。。。。。

有人知道如何将非英语UTF-16字符数据转换为wstring数据吗？

添加：C#侧代码

byte[] sendBuffer = Encoding.Unicode.GetBytes(Console.ReadLine());
clientSocket.Send(sendBuffer);

它转换한글'转换成类似字节的

    [0] 92  byte
    [1] 213 byte
    [2] 0   byte
    [3] 174 byte

我尝试将C#字符串数据发送到C++wstring数据，反之亦然。(通过TCP(

我成功地从C#(如Unicode、UTF-16(发送了字符串数据，并通过char数组在C++中获得了它。

使用UTF-8而不是UTF-16传输数据会更好，也更便携。

但我不知道如何将char数组转换为wstring。

在wchar_t为16位的平台上，例如Windows(我认为您使用的是C#(，您可以直接将char数组内容原样复制到std::wstring中，例如：

char *buffer = ...;
int buflen = ...;
std::wstring wstr(reinterpret_cast<wchar_t*>(buffer), buflen / sizeof(wchar_t));

如果您需要支持wchar_t为32位的平台，则可以使用std::wstring_convert:

char *buffer = ...;
int buflen = ...;
std::wstring_convert<std::codecvt_utf16<wchar_t>, wchar_t> conv;
std::wstring wstr = conv.from_bytes(std::string(buffer, buflen));
// or:
// std::wstring wstr = conv.from_bytes(buffer, buffer+buflen);

由于wchar_t不是很可移植，请考虑使用std::u16string/char16_t(如果您使用的编译器支持C++11或更高版本(，因为它们是专门为UTF-16数据设计的。

由于UTF-8不支持所有日语字符

是的，确实如此。Unicode是实际的字符集，UTF-8只是将Unicode代码点表示为字节序列的编码ALLUTF(UTF-7、UTF-8、UTF-16和UTF-32(支持ENTIREUnicode字符集，并且UTF的设计允许从一个UTF到另一个的无损耗转换。