如何用C++将非英语字符串写入文件并从该文件中读取

How to write a non-English string to a file and read from that file with C++?

本文关键字:文件 读取 C++ 何用 字符串 英语      更新时间:2023-10-16

我想在文件中写入一个std::wstring,并需要将该内容作为std:wstring读取。当字符串为L"<Any English letter>"时,这将按预期发生。但当我们有孟加拉语、卡纳达语、日语等字符,任何一种非英语字母时,问题就会发生。尝试了各种选项,如:

  1. std::wstring转换成std::string并写入文件,读取时间读取为std::string并转换为std::wstring
    • 写作正在发生(我可以从edito那里看到),但阅读时间搞错了
  2. std::wstring写入wofstream,这对母语字符字母,如std::wstring data = L"হ্যালো ওয়ার্ল্ড";

平台是mac和Linux,语言是C++

代码:

bool
write_file(
    const char*         path,
    const std::wstring  data
) {
    bool status = false;
    try {
        std::wofstream file(path, std::ios::out|std::ios::trunc|std::ios::binary);
        if (file.is_open()) {
            //std::string data_str = convert_wstring_to_string(data);
            file.write(data.c_str(), (std::streamsize)data.size());
            file.close();
            status = true;
        }
    } catch (...) {
        std::cout<<"exception !"<<std::endl;
    }
    return status;
}

// Read Method
std::wstring
read_file(
    const char*  filename
) {
    std::wifstream fhandle(filename, std::ios::in | std::ios::binary);
    if (fhandle) {
        std::wstring contents;
        fhandle.seekg(0, std::ios::end);
        contents.resize((int)fhandle.tellg());
        fhandle.seekg(0, std::ios::beg);
        fhandle.read(&contents[0], contents.size());
        fhandle.close();
        return(contents);
    }
    else {
        return L"";
    }
}
// Main
int main()
{
  const char* file_path_1 = "./file_content_1.txt";
  const char* file_path_2 = "./file_content_2.txt";
  //std::wstring data = L"Text message to write onto the filen";  // This is happening as expected
  std::wstring data = L"হ্যালো ওয়ার্ল্ড";
// Not happening as expected.
  // Lets write some data
  write_file(file_path_1, data);
 // Lets read the file
 std::wstring out = read_file(file_path_1);
 std::wcout<<L"File Content: "<<out<<std::endl;
 // Let write that same data onto the different file
 write_file(file_path_2, out);
 return 0;
}
wchar_t的输出方式取决于语言环境。默认值区域设置("C")通常不接受ASCII以外的任何内容(Unicode代码点0x20…0x7E,加上一些控件字符。)

每当程序处理文本时main应为:

std::locale::global( std::locale( "" ) );

如果程序使用任何标准流对象还应该为它们注入全局区域设置,之前输入或输出。

要读取和写入unicode文件(假设您想写入unicode字符),可以尝试fopen_s

FILE *file;
if((fopen_s(&file, file_path, "w,ccs=UNICODE" )) == NULL)
{
    fputws(your_wstring().c_str(), file);
}

读回字符串时可能会出现一个问题,因为您将字符串的长度设置为文件中的字节数,而不是字符数。这意味着您试图读取文件的末尾,并且字符串的末尾将包含垃圾。

如果您正在处理文本文件,为什么不简单地使用普通的输出和输入运算符<<>>或其他文本函数(如std::getline)呢?

后期编辑:这是针对Windows的(因为在回答时没有标签)

您需要将流设置为支持这些字符的区域设置。试试这样的东西(对于UTF8/UTF16):

std::wofstream myFile("out.txt"); // writing to this file 
myFile.imbue(std::locale(myFile.getloc(), new std::codecvt_utf8_utf16<wchar_t>));

当你从该文件中读取时,你必须做同样的事情:

std::wifstream myFile2("out.txt"); // reading from this file
myFile2.imbue(std::locale(myFile2.getloc(), new std::codecvt_utf8_utf16<wchar_t>));

不要使用wstring或wchar_t。如今,在非Windows平台上,wchar_t几乎毫无价值。

相反,您应该使用UTF-8。

bool
write_file(
    const char*         path,
    const std::string   data
) {
    try {
        std::ofstream file(path, std::ios::out | std::ios::trunc | std::ios::binary);
        file.exceptions(true);
        file << data;
        return true;
    } catch (...) {
        std::cout << "exception!n";
        return false;
    }
}

// Read Method
std::string
read_file(
    const char*  filename
) {
    std::ifstream fhandle(filename, std::ios::in | std::ios::binary);
    if (fhandle) {
        std::string contents;
        fhandle.seekg(0, std::ios::end);
        contents.resize(fhandle.tellg());
        fhandle.seekg(0, std::ios::beg);
        fhandle.read(&contents[0], contents.size());
        return contents;
    } else {
        return "";
    }
}
int main()
{
  const char* file_path_1 = "./file_content_1.txt";
  const char* file_path_2 = "./file_content_2.txt";
  std::string data = "হ্যালো ওয়ার্ল্ড"; // linux and os x compilers use UTF-8 as the default execution encoding.
  write_file(file_path_1, data);
  std::string out = read_file(file_path_1);
  std::wcout << "File Content: " << out << 'n';
  write_file(file_path_2, out);
}