STD :: REGEX_MATCH和具有奇怪行为的懒惰量词

std::regex_match and lazy quantifier with strange behavior

本文关键字：量词 REGEX MATCH STD 更新时间：2023-10-16

我知道：
懒量词匹配：尽可能少（最短匹配）

还知道构造函数：

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

和：
ECMAScript支持非怪兽匹配，
和ECMAScript REGEX "<tag[^>]*>.*?</tag>"
仅在 first 关闭标签之前匹配...en.cppreference

和：
最多必须从ECMAScript中选择一个语法选项， basic，extended，awk，grep，egrep。如果没有选择语法，假定ECMAScript被选择...en.cppreference

和：
请注意，regex_match只能成功将正则表达式与整个字符序列匹配，而std::regex_search将成功匹配子序列... STD :: REGEX_MATCH

这是我的代码： live

#include <iostream>
#include <string>
#include <regex>
int main(){
        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;
        using namespace std::regex_constants;
        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << 'n';
        std::cout << match.str() << 'n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex, match_not_bol | match_not_eol );
        std::cout << test << 'n';
        std::cout << match.str() << 'n';
}

和输出：

1
s/one/
1
s/one/two/three/four/five/six/g
Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.

std::regex_match不应匹配任何东西，它应该返回 0，>>>> non-greedy 量词 .+?

实际上，在这里，非怪兽 .+?量词具有与贪婪 ONE相同的含义，并且/.+?/和/.+/都匹配相同的字符串。它们是不同的模式。所以问题是为什么忽略了问号？

Regex101

快速测试：

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+?/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?/.+/g?/ && print $&'
$ s/one/two/three/four/five/six/g

注意
此正则：std::basic_regex< char > regex ( "s?/.+?/g?" );非怪兽
这是：std::basic_regex< char > regex ( "s?/.+/g?" );贪婪
使用std::regex_match具有相同的输出。仍然都匹配整个字符串！
但是使用std::regex_search具有不同的输出。
s?或g?也无关紧要，并且/.*?/仍然与整个字符串匹配！

更多细节

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901

我看不到任何不一致。regex_match试图匹配整个字符串，因此s?/.+?/g?懒洋洋地扩展到整个字符串。

这些"图表"（对于regex_search）有望帮助获得贪婪的想法：

Non-greedy:
a.*?a: ababa
a|.*?a: a|baba
a.*?|a: a|baba  # ok, let's try .*? == "" first
# can't go further, backtracking
a.*?|a: ab|aba  # lets try .*? == "b" now
a.*?a|: aba|ba
# If the regex were a.*?a$, there would be two extra backtracking
# steps such that .*? == "bab".
Greedy:
a.*?a: ababa
a|.*a: a|baba
a.*|a: ababa|  # try .* == "baba" first
# backtrack
a.*|a: abab|a  # try .* == "bab" now
a.*a|: ababa|

和regex_match( abc )在这种情况下就像regex_search( ^abc$ )。