从向量<char>创建最长的字符串

Create longest possible string from vector<char>

本文关键字：字符串创建 char 向量 lt gt 更新时间：2023-10-16

我作为vector<char>接收数据，我需要从中创建string。Vector可以包含utf-16字符(即空字节)，并且是固定大小。实际数据用空字节填充到这个固定大小。因此，例如，我可以有以下向量:

 a  b  c  d

固定大小为12，向量包含utf-16字符串"abcd"，填充4个空字符。

从中，我需要实际提取这个字符串。我已经有了从utf-16转换到string的代码，我让自己困惑的事情是在没有填充的矢量中找到字符(字节)的数量。在上面的例子中，数字是8。

我开始这样做:

std::string CrmxFile::StringFromBytes(std::vector<char> data, int fixedsize) {
    std::vector<char>iterator it = data.rbegin();
    while(it != data.rend() && *it == '') {
        it++;
    }
    return std::string(&data[0], fixedsize - (it - data.rbegin());
}

然而，在完整的上下文中，向量包含大量数据，我只需要对其中指定的一部分进行上述操作。例如，向量可能包含1000个元素，我需要得到从位置30开始的字符串，最大长度为12个字符。当然，在应用上述逻辑之前，我可以创建另一个向量并将所需的21个字符复制到其中，但我觉得我应该能够直接在给定的向量上做一些事情。然而，我无法掌握我正在与哪些迭代器进行比较。

现在，这很尴尬:vector<char>::iterator显然是一个随机访问迭代器，因此我可以减少它。因此，我的方法现在看起来像这样:

std::string CrmxFile::StringFromBytes(std::vector<char> data, int fixedsize) {
    std::vector<char>::iterator begin = data.begin() + start;
    std::vector<char>::iterator end = start + length - 1;
    while(it >= begin  && *it == '') {
        it--;
    }
    if(it >= begin) {
        int len = it - begin + 1;
        if(IsUtf8Heuristic(begin, begin + len) {
            return std::string(begin, begin + len);
        }
        else {  //(heuristically this is utf-16)
            len = ((len + 1) >> 1) << 1;
            std::string res;
            ConvertUtf16To8(begin, begin + len, std::back_inserter(res));
            return res;
        }
    }
    else {
        return "";
    }
}

正如我所理解的问题，您希望从data中提取最大fixedsize的一部分，并擦除所有尾随零。从评论中你想要最优的解决方案。

对我来说，如果数据总是以数组形式出现，那么你的代码就过于复杂了。使用索引，它们更能自我描述。

std::vector<char> data = ...;
int fixedsize = ...;
int start = ...;
int i = start + fixedsize - 1; // last character that can be in the string
while(i >= start && data[i] == 0) i--; // 'remove' the trailing zeroes
std::string result(&data[start], i - start + 1);

这是最优算法，没有"更优"的算法(有一个微优化，包括用int s而不是char s测试，即连续4个char s)。

相关文章：

C++笔记网为您整理了各种C++编程过程中遇到的问题.

最新更新：