使用 string::find 和 string::substr 拆分字符串的函数返回错误的标记

Function to split strings using string::find and string::substr returns wrong tokens

本文关键字:string 函数 返回 错误 拆分 find substr 使用 字符串      更新时间:2023-10-16
//splits a string into a vector of multiple tokens
std::vector<string> split_str(std::string& str, const char* delimiter){
    std::vector<string> ret;
    size_t currPos = 0;
    //Add the first element to the vector
    if (str.find(delimiter) != string::npos)
        ret.push_back(str.substr(currPos, str.find(delimiter)));

    while (currPos != str.size() - 1){
        if (str.find(delimiter, currPos) != string::npos){
            //Current at one past the delimiter
            currPos = str.find(delimiter, currPos) + 1;
            //Substring everything from one past the delimiter until the next delimiter
            ret.push_back(str.substr(currPos, str.find(delimiter, currPos)));
        }
        //If last whitespace is not right at the end
        else if (currPos < str.size()){
            //Add the last element to the vector and end the loop
            ret.push_back(str.substr(currPos, str.size()));
            currPos = str.size() - 1;
        }
    }
    return ret;
}

该程序应该将字符串和分隔符作为输入,并返回字符串(标记(向量作为输出。但是,当我尝试使用简单的输入时,例如:

ab bc cd de (分隔符为 " "(

输出将是 5 个元素:"ab"、"bc cd"、"cd de"、"de"、"de">

问题是要std::string::substr()的第二个参数是计数而不是位置。您的代码应从以下位置修改:

if (str.find(delimiter) != string::npos)
    ret.push_back(str.substr(currPos, str.find(delimiter)));

对此:

auto fpos = str.find(delimiter);
if (fpos != string::npos)
    ret.push_back(str.substr(currPos, fpos - currPos));
    //                                ^^^^^^^^^^^^^^

等等。

使用 find_first_of 而不是 find 会更正确。考虑到字符串中可以有相邻的空白,而且字符串可以从空白开始。

这是一个演示性图片,展示了如何编写函数

#include <iostream>
#include <string>
#include <vector>
std::vector<std::string> split_str( const std::string &s, const char *delimiter )
{
    std::vector<std::string> v;
    size_t n = 0;
    for ( std::string::size_type pos = 0;
          ( pos = s.find_first_not_of( delimiter, pos ) ) != std::string::npos;
          pos = s.find_first_of( delimiter, pos ) )
    {
        ++n;
    }        
    v.reserve( n );
    for ( std::string::size_type pos = 0;
          ( pos = s.find_first_not_of( delimiter, pos ) ) != std::string::npos; )
    {
        auto next_pos = s.find_first_of( delimiter, pos );
        if ( next_pos == std::string::npos ) next_pos = s.size();
        v.push_back( s.substr( pos, next_pos - pos ) );
        pos = next_pos;
    }        
    return v;
}

int main() 
{
    std::string s( "ab bc cd de " );
    std::cout << s << std::endl;    
    auto v = split_str( s, " " );
    for ( auto t : v ) std::cout << t << std::endl;
    return 0;
}

程序输出为

ab bc cd de 
ab
bc
cd
de