Strtok 如何也包含分隔符作为令牌

strtok how to also include delimiters as tokens

本文关键字：令牌分隔符包含 Strtok 更新时间：2023-10-16

现在我已经设置了代码，将我的字符串分成带有分隔符的标记，;= 和空格。我还想包含特殊字符作为令牌。

char * cstr = new char [str.length()+1];
strcpy (cstr, str.c_str());
char * p = strtok (cstr," ");
while (p!=0)
{
    whichType(p);
    p = strtok(NULL," ,;=");
}

所以现在如果我打印出字符串的标记，例如，asd sdf qwe wer,sdf;wer它会是

asd
sdf
qwe
wer
sdf
wer

我希望它看起来像

asd
sdf
qwe
wer
,
sdf
;
wer

任何帮助都会很棒。谢谢

您需要更大的灵活性。（此外，strtok是一个糟糕的、容易出错的界面）。

这是一种灵活的算法，可生成令牌，并将其复制到输出迭代器。这意味着您可以使用它来填充您选择的容器，或将其直接打印到输出流（这是我将用作演示的内容）。

行为在选项标志中指定：

enum tokenize_options
{
    tokenize_skip_empty_tokens              = 1 << 0,
    tokenize_include_delimiters             = 1 << 1,
    tokenize_exclude_whitespace_delimiters  = 1 << 2,
    //
    tokenize_options_none    = 0,
    tokenize_default_options =   tokenize_skip_empty_tokens 
                               | tokenize_exclude_whitespace_delimiters
                               | tokenize_include_delimiters,
};

不是我实际上如何提炼出您尚未命名的额外要求，但您的示例暗示：您希望分隔符输出为标记，除非它们是空格（' ' ）。这就是第三个选项的用武之地： tokenize_exclude_whitespace_delimiters .

现在这是真正的肉：

template <typename Input, typename Delimiters, typename Out>
Out tokenize(
        Input const& input,
        Delimiters const& delim,
        Out out,
        tokenize_options options = tokenize_default_options
        )
{
    // decode option flags
    const bool includeDelim   = options & tokenize_include_delimiters;
    const bool excludeWsDelim = options & tokenize_exclude_whitespace_delimiters;
    const bool skipEmpty      = options & tokenize_skip_empty_tokens;
    using namespace std;
    string accum;
    for(auto it = begin(input), last = end(input); it != last; ++it)
    {
        if (find(begin(delim), end(delim), *it) == end(delim))
        {
            accum += *it;
        }
        else
        {
            // output the token
            if (!(skipEmpty && accum.empty()))
                *out++ = accum;   // optionally skip if `accum.empty()`?
            // output the delimiter
            bool isWhitespace = std::isspace(*it) || (*it == ''); 
            if (includeDelim && !(excludeWsDelim && isWhitespace))
            {
                *out++ = { *it }; // dump the delimiter as a separate token
            }
            accum.clear();
        }
    }
    if (!accum.empty())
        *out++ = accum;
    return out;
}

完整的演示是Live on Ideone（默认选项）和Live on Coliru（无选项）

int main()
{
    // let's print tokens to stdout
    std::ostringstream oss;
    std::ostream_iterator<std::string> out(oss, "n"); 
    tokenize("asd sdf qwe wer,sdf;wer", " ;,", out/*, tokenize_options_none*/);
    std::cout << oss.str();
    // that's all, folks
}

指纹：

asd
sdf
qwe
wer
,
sdf
;
wer

恐

怕你不能为此使用strtok，你需要一个合适的分词器。

如果您的令牌很简单，我建议您手动编码，即逐个字符扫描字符串。如果不是，我建议你看看几种选择。或者，如果它真的很复杂，您可以使用像 flex .

//TRY THE FOLLOWING CODE
#include <iostream>
#include <string>
#include <vector>
  int main()
  {
    std::string line = "asd sdf qwe wer,sdf;wer";
    std::vector<std::string> wordVector;
    std::vector<std::string>::iterator IwordVector;
    std::size_t prev = 0, pos;
    while ((pos = line.find_first_of(" ,;", prev)) != std::string::npos) {
      if (pos > prev)
        wordVector.push_back(line.substr(prev, pos-prev));
      prev = pos+1;
      if (std::string(1,line.at((unsigned int)pos)) != " ")
        wordVector.push_back(std::string(1,line.at((unsigned int)pos)));
    }
    if (prev < line.length())
      wordVector.push_back(line.substr(prev, std::string::npos));
    for(IwordVector = wordVector.begin(); IwordVector != wordVector.end(); IwordVector++)
      std::cout << "n"<<*IwordVector;
    return 0;
  }
**OUPUT**: [root@kumar-vm ~]# ./a.out
asd 
sdf 
qwe 
wer 
, 
sdf 
;
wer[root@kumar-vm ~]#