标记器点分开，但也保留空字段

Boost::tokenizer point separated, but also keeping empty fields

本文关键字：保留字段点分更新时间：2023-10-16

我看到了这个问题，我的问题和它非常相似，但它是不同的，所以请不要把它标记为重复。

我的问题是:我如何从字符串中获得空字段?

我有一个像std::string s = "This.is..a.test";这样的字符串，我想得到<This> <is> <> <a> <test>字段。

我也试过了

typedef boost::char_separator<char> ChSep;
typedef boost::tokenizer<ChSep> TknChSep;
ChSep sep(".", ".", boost::keep_empty_tokens);
TknChSep tok(s, sep);
for (TknChSep::iterator beg = tok.begin(); beg != tok.end(); ++beg)
{
  std::cout << "<" << *beg << "> ";
}

但是我得到了<This> <.> <is> <.> <> <.> <a> <test>

Boost的第二个参数。标记器的char_separator是kept_delims参数。它用于指定将显示为令牌的分隔符。原始代码指定"."应该作为令牌保存。要解决这个问题，修改:

ChSep sep(".", ".", boost::keep_empty_tokens);

:

ChSep sep(".", "", boost::keep_empty_tokens);
            // ^-- no delimiters will show up as tokens.

下面是一个完整的例子:

#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
int main()
{
  std::string str = "This.is..a.test";
  typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
  boost::char_separator<char> sep(
      ".", // dropped delimiters
      "",  // kept delimiters
      boost::keep_empty_tokens); // empty token policy
  BOOST_FOREACH(std::string token, tokenizer(str, sep))
  {
    std::cout << "<" << token << "> ";
  }
  std::cout << std::endl;
}

产生期望的输出:

<This> <is> <> <a> <test>

我想我会跳过Boost::tokenizer，只是使用标准正则表达式来进行拆分:

#include <iterator>
#include <regex>
#include <string>
#include <iostream>
int main() {     
    std::string s = "This.is..a.test";
    std::regex sep{ "\." };
    std::copy(std::sregex_token_iterator(s.begin(), s.end(), sep, -1),
        std::sregex_token_iterator(), 
        std::ostream_iterator<std::string>(std::cout, "n"));
}

结果:

This
is
a
test