加载并保存一个带有精美字符的HTML文件

Load and save an HTML file with polish characters

本文关键字：字符文件 HTML 一个保存加载更新时间：2023-10-16

我需要加载一个HTML模板文件(使用std::ifstream)，添加一些内容，然后将其保存为一个完整的网页。如果不是为了优化字符，这将是足够简单的-我已经尝试了char/wchar_t, Unicode/Multi-Byte字符集，iso-8859-2/utf-8, ANSI/utf-8的所有组合，它们都不适合我(总是得到一些不正确显示的字符(或其中一些根本不显示)。

我可以在这里粘贴很多代码和文件，但我不确定这是否有帮助。但也许你可以告诉我:模板文件应该有什么格式/编码，我应该在网页中声明什么编码，以及我应该如何加载和保存该文件以获得适当的结果?

(如果我的问题不够具体，或者你做需要代码/文件示例，请告诉我。)

编辑:我已经尝试了评论中建议的库:

std::string fix_utf8_string(std::string const & str)
{
    std::string temp;
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp));
    return str;
}

调用:

fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");

抛出:utf8::not_enough_room -我做错了什么?

不确定这是否是(完美的)方式，但以下解决方案适用于我!

我将HTML模板文件保存为ANSI(或者至少notepad++是这么说的)，并更改了每个写入到文件流的操作:

file << std::string("some text with polish chars: ąśżźćńłóę");

:

file << ToUtf8("some text with polish chars: ąśżźćńłóę");

地点:

std::string ToUtf8(std::string ansiText)
{
    int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0);
    wchar_t * wideText = new wchar_t[ansiRequiredSize + 1];
    wideText[ansiRequiredSize] = NULL;
    MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize);
    int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL);
    char utf8Text[1024];
    utf8Text[utf8RequiredSize] = NULL;
    WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL);
    delete [] wideText;
    return utf8Text;
}

基本思想是使用MultiByteToWideChar()和WideCharToMultiByte()函数将字符串从ANSI(多字节)转换为宽字符，然后从宽字符转换为utf-8(更多信息请访问:http://www.chilkatsoft.com/p/p_348.asp)。最好的部分是-我不需要改变任何其他东西(即std::ofstream到std::wofstream或使用任何第三方库或改变我实际使用文件流的方式(而不是将字符串转换为utf-8，这是必要的)!

可能也应该适用于其他语言，尽管我没有测试。