增强正则表达式捕获

Boost regex expression capture

本文关键字：正则表达式增强更新时间：2023-10-16

我的目标是使用boost:：regex_search捕获一个整数。

#define BOOST_REGEX_MATCH_EXTRA
#include <boostregex.hpp>
#include <iostream>
int main(int argc, char* argv[])
{
  std::string tests[4] = {
    "SomeString #222",
    "SomeString #1",
    "SomeString #42",
    "SomeString #-1"
  };
  boost::regex rgx("#(-?[0-9]+)$");
  boost::smatch match;
  for(int i=0;i< 4; ++i)
  {
    std::cout << "Test " << i << std::endl;
    boost::regex_search(tests[i], match, rgx, boost::match_extra);
    for(int j=0; j< match.size(); ++j)
    {
      std::string match_string;
      match_string.assign(match[j].first, match[j].second);
      std::cout << "    Match " << j << ": " << match_string << std::endl;
    }
  }
  system("pause");
}

我注意到每个正则表达式搜索都会得到两个匹配项。第一个是匹配的字符串，第二个是括号中的捕获。

Test 0
    Match 0: #222
    Match 1: 222
Test 1
    Match 0: #1
    Match 1: 1
Test 2
    Match 0: #42
    Match 1: 42
Test 3
    Match 0: #-1
    Match 1: -1

除非需要，否则文档不鼓励使用BOOST_REGEX_MACH_EXTRA。是否需要在括号内捕获单个匹配项，或者是否有其他方法？

如果你想要更高的速度，也许Boost Spirit可以带来它，或者其他Boost Xpressive。

两者都将从表达式模板生成代码。这意味着，除其他外，如果你不"吸收"任何属性值，就不会产生任何成本。

提升精神：

此解决方案仅为标头。它可能会变得更高效，但这里有一个开始：

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main()
{
    std::string const tests[] = {
        "SomeString #222",
        "SomeString #1",
        "SomeString #42",
        "SomeString #-1"
    };
    for(auto& input : tests)
    {
        int value;
        auto f(input.begin()), l(input.end());
        if (qi::phrase_parse(f, l,  // input iterators
                    qi::omit [ *~qi::char_('#') ] >> '#' >> qi::int_, // grammar
                    qi::space,      // skipper
                    value))         // output attribute
        {
            std::cout << "     Input '" << input << "' -> " << value << "n";
        }
    }
}

查看Coliru直播

Boost Xpressive

#include <boost/xpressive/xpressive_static.hpp>
#include <iostream>
namespace xp = boost::xpressive;
int main()
{
    std::string const tests[] = {
        "SomeString #222",
        "SomeString #1",
        "SomeString #42",
        "SomeString #-1"
    };
    for(auto& input : tests)
    {
        static xp::sregex rex = (xp::s1= -*xp::_) >> '#' >> (xp::s2= !xp::as_xpr('-') >> +xp::_d);
        xp::smatch what;
        if(xp::regex_match(input, what, rex))
        {
            std::cout << "Input '" << what[0] << " -> " << what[2] << 'n';
        }
    }
}

也可以在Coliru上观看直播。

我有一种预感，Spirit解决方案将更具性能，并且接近您想要的（因为它解析通用语法，并直接将其解析为您想要的数据类型）。