如何从C/ c++代码中找到所有可能的字符串常量

C++ boost regex: How to find all possible string constants from C/C++ code?

本文关键字:有可能 常量 字符串 代码 c++      更新时间:2023-10-16

我需要在c++中检测所有可能的C/c++字符串常量:

std::string s = "dummy text"; // comment
std::string s = "dummier text about "nothing""; // don't worry
std::string multiLineString = "dummy multiline 
"another line";
std::string s1="aaa", s2="bbb";
std::string multiString="aaa" "bbb";
std::string division="a/b=c";

还:

char c = '"';
char c = 't';
char c = ''';
char c = '';

我想从上面的代码中提取:

"dummy text"
"dummier text about "nothing""
"dummy multiline 
"aaa"
"a/b=c"
'"'
't'
'''
''

注意:我逐行处理文本,所以我只需要每行的第一个字符串,例如:"dummy multiline

所以首先我尝试了,然后艾伦的解决方案非常有用:在c#中使用正则表达式
查找带转义引号的字符串最后,我创建了这样的程序:

#include <iostream>
#include <string>
#include <boost/regex.hpp>
boost::regex regex2quotes;
void initRegex()
{
    std::string notDQuota = "((?!\\).)*?";
    std::string dQuota = "[\"]";
    std::string notSQuota = "((?!\\).){1,2}?";
    std::string sQuota = "[']";
    std::string dQuotaExpression = '(' + dQuota + notDQuota + dQuota + ')';
    std::string sQuotaExpression = '(' + sQuota + notSQuota + sQuota + ')';
    std::string finalExpression = dQuotaExpression + '|' + sQuotaExpression;
    std::cout << "Regex>>>>" << finalExpression << "<<<<<nn";
    regex2quotes = finalExpression;
}
void checkIfFound(std::string text)
{
    std::cout << "text>>>>>" << text << "<<<n";
    boost::smatch result;
    bool found = boost::regex_search(text, result, regex2quotes);
    if(found)
        std::cout << "Found====" << result[0] << "====n";
    else
        std::cout << "!!!Text not found in: " << text << std::endl; 
}
int main(int argc, char *argv[])
{
    initRegex();
    checkIfFound("std::string s = "dummy text"; // comment");
    checkIfFound("std::string s = "dummier text about \"nothing\""; // don't worry");
    checkIfFound("std::string multiLineString = "dummy \n
                "another line";");
    checkIfFound("std::string s1="aaa", s2="bbb";");     
    checkIfFound("std::string multiString="aaa" "bbb";");
    checkIfFound("std::string division="a/b=c";");
    checkIfFound(""text";");
    checkIfFound("char c = '"';");
    checkIfFound("char c = 'n';");
    checkIfFound("char c = ''';");
    checkIfFound("char c = '';");
    return 0;
}

不幸的是,它没有提取我需要的所有测试用例,输出:

Regex>>>>(["]((?!\).)*?["])|([']((?!\).){1,2}?['])<<<<<
text>>>>>std::string s = "dummy text"; // comment<<<
Found===="dummy text"====
text>>>>>std::string s = "dummier text about "nothing""; // don't worry<<<
Found====""====
text>>>>>std::string multiLineString = "dummy 
                "another line";<<<
Found===="another line"====
text>>>>>std::string s1="aaa", s2="bbb";<<<
Found===="aaa"====
text>>>>>std::string multiString="aaa" "bbb";<<<
Found===="aaa"====
text>>>>>std::string division="a/b=c";<<<
Found===="a/b=c"====
text>>>>>"text";<<<
Found===="text"====
text>>>>>char c = '"';<<<
Found===='"'====
text>>>>>char c = ' ';<<<
Found===='  '====
text>>>>>char c = ''';<<<
!!!Text not found in: char c = ''';
text>>>>>char c = '';<<<
!!!Text not found in: char c = '';

你能给我一些建议吗?是否有可能检测它与正则表达式?

我有一个工作的正则表达式给你。正则表达式为:

(/*.*?*/)|(//.*$)|"((?:[^"nr]|\.)*)\$|"((?:\.|[^"nr])*)"|(?:'(\?.)')

看它在这里工作

问题是……我不知道boost是否足够好去实现它……需要做的是做一个匹配,然后看看捕获组1或2是否匹配。然后是注释匹配-忽略匹配。

如果另一个捕获组匹配(3,4或5),它是一个"字符串"常量。(3是以结尾的字符串,4是"普通"字符串,5是字符。)然后重复,直到没有找到匹配。

从你的尝试改进是它处理注释- /* ... */ -以及

我不知道你为什么不想处理多行。

std::string multiLineString = "dummy multiline 
"another line";

不是合法的c++代码。如果是

std::string multiLineString = "dummy multiline 
another line";

它会是。但是你不能单独处理这些线。您必须将整个代码作为单个块运行。但我相信你会想办法的。

希望对你有帮助。

编辑:

不能让这个走;)下面是代码:

#include "stdafx.h"
#include <iostream>
#include <string>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
string sRE = "(\/\*.*?\*\/)|(\/\/.*$)|"((?:[^"\n\r]|\\.)*)\\$|"((?:\\.|[^"\n\r])*)"|(?:'(\\?.)')";
regex re(sRE);
void checkIfFound(string text)
{
    string::const_iterator start = text.begin();
    string::const_iterator end   = text.end();
    smatch what;
    while (regex_search(start, end, what, re))
    {
        for( int idx=3; idx<=5; idx++ )
        {
            if( what[idx].matched )
                cout << "Static text found >>>" << string(what[idx].first, what[idx].second) << "<<<" << endl;
        }
        // Update the beginning of the range to the character
        // following the whole match
        start = what[0].second;
    }
}
int _tmain(int argc, char* argv[])
{
    cout << "Regex:rn" << sRE << "nn";
    checkIfFound("std::string s = "dummy text"; // comment");
    checkIfFound("std::string s = "dummier text about \"nothing\""; // don't worry");
    checkIfFound("std::string multiLineString = "dummy \n
                 "another line";");
    checkIfFound("std::string s1="aaa", s2="bbb"; /* "Not a string" */");     
    checkIfFound("std::string multiString="aaa" "bbb";");
    checkIfFound("std::string division="a/b=c";");
    checkIfFound(""text";");
    checkIfFound("char c = '"';");
    checkIfFound("char c = 'n';");
    checkIfFound("char c = ''';");
    checkIfFound("char c = '';");
    return 0;
}
输出:

Regex:
(/*.*?*/)|(//.*$)|"((?:[^"nr]|\.)*)\$|"((?:\.|[^"nr])*)"|(?:'(\?.)')
Static text found >>>dummy text<<<
Static text found >>>dummier text about "nothing"<<<
Static text found >>>dummy <<<
Static text found >>>another line<<<
Static text found >>>aaa<<<
Static text found >>>bbb<<<
Static text found >>>aaa<<<
Static text found >>>bbb<<<
Static text found >>>a/b=c<<<
Static text found >>>text<<<
Static text found >>>"<<<
Static text found >>>
<<<
Static text found >>>'<<<
Static text found >>>\<<<