RegexC++:从字符串中提取子字符串,然后计算每个单词

Regex C++: extract substring from a string then count each word

本文关键字:字符串 计算 然后 单词 提取 RegexC++      更新时间:2023-10-16

我有以下格式的文本字符串。

*tag0 hi how are you tag1 where are you from tag3 i would like to eat some food*

文本在一个向量中,我将其分配给一个变量字符串line2。我想从每个标签中提取单词,并将其作为标记进行计数。下面是我的代码。

smatch t_headermatch;
regex re("tag[0-9]+");
for (int i = 0; i < (int)boxraw.size(); ++i) {          
    line2 = boxraw.at(i); 
while (regex_search(line2, t_headermatch, re)){
        for (auto x : t_headermatch)cout << x << " ";
//If find tag header, print the words after the header and count it as token.
//repeat the process until found a new tag header.exit if no tag found

        cout <<endl;
        line2 = t_headermatch.suffix().str();
    }

我预期的输出如下:

Found 3 tag
tag0
hi token 1
how token 2
are token 3
you token 4
tag1
where 1 
are  2 
you 3
tag3 
i 1
would 2
like 3
to 4
eat 5
some 6
food 7

使用以下regex

"tag\d+((?:\s+(?!tag)\w+)+)"

每个regex_search将返回match_result对象

t_headermatch[0] : the whole match, i.e. "tag0 hi how are you"
t_headermatch[1] : the substring with tokens "hi how are you"

此外,您还需要拆分代币等。