Regex从大括号构建向量

Regex to build a vector from braces

本文关键字:构建 向量 Regex      更新时间:2023-10-16

我想打开std::string,例如:

"{1, 2}, {one, two}, {123, onetwothree}"

变成std::stringsstd::pairsstd::vector,看起来像:

std::vector<std::pair<std::string, std::string>> v = {{"1", "2"}, {"one", "two"}, {"123", "onetwothree"}};
// where, for instance
v[0] == std::make_pair("1", "2"); // etc.

在这种情况下,使用std::regex可以最容易地解析原始std::字符串,但我不是正则表达式专家(或新手),更不用说std::regex专家了。有什么食谱的想法吗?

目前,<regex>不能很好地与GCC配合使用,这里有一个使用-lboost_regex编译的增强版本。

boost捕获适合这种情况,但默认情况下未启用。

这是最初的帖子:Boost C++regex-如何获得多个匹配

#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
int main()
{
  string str = "{1, 2}, {one, two}, {123, onetwothree}";
  boost::regex pair_pat("\{[^{}]+\}");
  boost::regex elem_pat("\s*[^,{}]+\s*");
  boost::sregex_token_iterator end;
  for(boost::sregex_token_iterator iter(str.begin(), str.end(), pair_pat, 0);
      iter != end; ++iter) {
    string pair_str = *iter;
    cout << pair_str << endl;
    for (boost::sregex_token_iterator it(pair_str.begin(), pair_str.end(), elem_pat, 0);
         it != end; ++it)
      cout << *it << endl;
  }
  return 0;
}

匹配模式非常简单:"\{\s*(\w+)\s*\,\s*(\ w+)\ s*\}",所以我们只需要循环并组装所有匹配。C++11使这一切变得非常直接。试试看:

std::string str = "{1, 2}, {one, two}, {123, onetwothree}";
std::vector<std::pair<std::string, std::string>> pairs;
std::regex exp(R"({s*(w+)s*,s*(w+)s*})");
std::smatch sm;
std::string::const_iterator cit = str.cbegin();
while (std::regex_search(cit, str.cend(), sm, exp)) {
    if (sm.size() == 3) // 3 = match, first item, second item
        pairs.emplace_back(sm[1].str(), sm[2].str());
    // the next line is a bit cryptic, but it just puts cit at the remaining string start
    cit = sm[0].second;
}

编辑:关于它的工作原理的解释:它一次匹配一个模式,每次匹配后使用常量迭代器指向余数:

{1, 2}, {one, two}, {123, onetwothree}
^ iterator cit
-- regex_search matches "{1, 2}" sm[1] == "1", sm[2] == "2"
{1, 2}, {one, two}, {123, onetwothree}
      ^ iterator cit
-- regex_search matches "{one, two}" sm[1] == "one", sm[2] == "two"
{1, 2}, {one, two}, {123, onetwothree}
                  ^ iterator cit
-- regex_search matches "{123, onetwothree}" sm[1] == "123", sm[2] == "onetwothree"
{1, 2}, {one, two}, {123, onetwothree}
                                      ^ iterator cit
-- regex_search returns false, no match