在C/C++中将字符转换为uxx格式

converting a chararcter to uxxx format in C /C++

本文关键字:转换 uxx 格式 字符 C++      更新时间:2023-10-16

我想在C/C++程序中将字符串/char转换为\uxx格式。支持我有一个字符"a",我想打印转换为\u0041(标准unicode)。

第二件事是,我使用unix命令实用程序打印(printf)来打印\uxx字符串到char。我尝试使用"\u092b",它打印的字符与我的字体文件不同。谁能解释一下这背后的原因吗。

这里有一个函数使用标准C++来实现这一点(尽管根据CharT,它可能有一些有效实现定义的行为无法满足的要求)。

#include <codecvt>
#include <sstream>
#include <iomanip>
#include <iostream>
template<typename CharT,typename traits,typename allocator>
std::basic_string<CharT,traits,allocator>
to_uescapes(std::basic_string<CharT,traits,allocator> const &input)
{
    // string converter from CharT to char. If CharT = char then no conversion is done.
    // if CharT is char32_t or char16_t then the conversion is UTF-32/16 -> UTF-8. Not all implementations support this yet.
    // if CharT is something else then this uses implementation defined encodings and will only work for us if the implementation uses UTF-8 as the narrow char encoding
    std::wstring_convert<std::codecvt<CharT,char,std::mbstate_t>,CharT> convertA;
    // string converter from UTF-8 -> UTF-32. Not all implementations support this yet
    std::wstring_convert<std::codecvt<char32_t,char,std::mbstate_t>,char32_t> convertB;
    // convert from input encoding to UTF-32 (Assuming convertA produces UTF-8 string)
    std::u32string u32input = convertB.from_bytes(convertA.to_bytes(input));
    std::basic_stringstream<CharT,traits,allocator> ss;
    ss.fill('0');
    ss << std::hex;
    for(char32_t c : u32input) {
        if(c < U'U00010000')
            ss << convertA.from_bytes("\u") << std::setw(4) << (unsigned int)c;
        else
            ss << convertA.from_bytes("\U") << std::setw(8) << (unsigned int)c;
    }
    return ss.str();
}
template<typename CharT>
std::basic_string<CharT>
to_uescapes(CharT const *input)
{
    return to_uescapes(std::basic_string<CharT>(input));
}
int main() {
    std::string s = to_uescapes(u8"Hello U00010000");
    std::cout << s << 'n';
}

这应该打印:

\u0048\u0065\u006c\u006c\u006f\u0020\U00010000