标准c++中的Regex

Regex in std c++

本文关键字：Regex 中的 c++ 标准更新时间：2023-10-16

我想找到类似"｛some text｝"的所有事件。

我的代码是：

std::wregex e(L"(\{([a-z]+)\})");
    std::wsmatch m;

    std::regex_search(chatMessage, m, e);
    std::wcout << "matches for '" << chatMessage << "'n";
    for (size_t i = 0; i < m.size(); ++i) {
        std::wssub_match sub_match = m[i];
        std::wstring sub_match_str = sub_match.str();
        std::wcout << i << ": " << sub_match_str << 'n';
    }

但对于这样的字符串：L"玫瑰{aaa}{bbb}是{ccc}#ff0000"）我的输出是：

0: {aaa}
1: {aaa}
2: aaa

并且我没有得到下一个子串。我怀疑我的正则表达式有问题。你们中有人知道出了什么问题吗？

您只需搜索一次，然后简单地在组中循环。相反，您需要多次搜索并只返回正确的组。尝试：

std::wregex e(L"(\{([a-z]+)\})");
std::wsmatch m;
std::wcout << "matches for '" << chatMessage << "'n";
while (std::regex_search(chatMessage, m, e))
{
    std::wssub_match sub_match = m[2];
    std::wstring sub_match_str = sub_match.str();
    std::wcout << sub_match_str << 'n';
    chatMessage = m.suffix().str(); // this advances the position in the string
}

这里的2是第二组，即括号中的第二个东西，即([a-z]+)。

有关组的更多信息，请参阅此。

正则表达式没有任何问题，但您需要反复搜索它。而且你根本不需要括号。

std::regex_search发现模式的一次出现。这是{aaa}。std::wsmatch就是这样。它有3个子表。整个字符串、外括号的内容（再次是整个字符串）和内括号的内容。这就是你所看到的。

您必须在字符串的其余部分再次调用regex_search才能获得下一个匹配项：

std::wstring::const_iterator begin = chatMessage.begin(), end = chatMessage.end();
while (std::regex_search(begin, end, m, e)) {
    // ...
    begin = m.end();
}

regex_match对象上的索引运算符返回该索引处的匹配子字符串。当索引为0时，它返回整个匹配字符串，这就是为什么输出的第一行是{aaa}。当索引为1时，它返回第一个捕获组的内容，即正则表达式中位于第一个(和对应的)之间的部分所匹配的文本。在本例中，这些是最外面的括号，它再次生成{abc}。当索引为2时，返回第二个捕获组的内容，即第二个(与其对应的)之间的文本，该文本为aaa。

从您停止的地方再次搜索最简单的方法是使用迭代器：

std::wsregex_iterator it(chatMessage.begin(), chatMessage.end(), e);
for ( ; it != wsregex_iterator(); ++it) {
    std::cout << *it << 'n';
}

（注意：这是一个草图，未经测试）