Regex查找复杂参数

Regex to find complex parameters

本文关键字:参数 复杂 查找 Regex      更新时间:2023-10-16

我试图找到所有的参数值从一个字符串与以下形式:

pN  stands for the Nth parameter: it can be composed of the following chars:
    letters, numbers, and any char included in kSuportedNamesCharsRegEx
vNX for the the Xnt component of the value of the Nth parameter
    vNX accepts arithmetical expressions. Therefore I have constructed kSuportedValuesCharsRegEx. Additionally, it could allow simple/nested list as the value.

下面是要解析的字符串示例

p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5

,我应该获得"p1"、"p2 = (v21 + v22)","p3 = v31-v32"、"p4"、"p5 = v5"

可以看到,参数可以有值,也可以没有值。我正在使用c++ boost库(所以我认为我没有可用的后面看)。到目前为止,我只需要处理有值的参数,所以我一直使用以下代码:

static const std::string kSpecialCharsRegEx = "\.\{\}\(\)\\\*\-\+\?\|\^\$";
static const std::string kSuportedNamesCharsRegEx = "[A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]+";
static const std::string kSuportedValuesCharsRegEx   = "([\s"A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]|(==)|(>=)|(<=))+";
static const std::string kSimpleListRegEx    = "\[" + kSuportedValuesCharsRegEx + "\]";
static const std::string kDeepListRegEx  = "\[(" + kSuportedValuesCharsRegEx + "|(" + kSimpleListRegEx + "))+\]";
// Main idea
//static const std::string stackRegex = "\w+\s*=\s*[\w\s]+(?=\s+\w+=)"
//          "|\w+\s*=\s*[\w\s]+(?!\w+=)"
//          "|\w+\s*=\s*\[[\w\s]+\]";
// + deep listing support
    // Main regex
static const std::string kParameterRegEx = 
    + "\b" + kSuportedNamesCharsRegEx + "\b\s*=\s*" + kSuportedValuesCharsRegEx + "(?=\s+\b" + kSuportedNamesCharsRegEx + "\b=)"
    + "|"
    + "\b" + kSuportedNamesCharsRegEx + "\b\s*=\s*" + kSuportedValuesCharsRegEx +"(?!" + kSuportedNamesCharsRegEx + "=)"
    + "|"
    + "\b" + kSuportedNamesCharsRegEx + "\b\s*=\s*(" + kDeepListRegEx + ")";

然而,现在我需要处理非值参数,我有麻烦创建正确的正则表达式。

有人能帮我解决这个问题吗?提前感谢

就像mkaes建议的那样,您只需要在这里设计一个简单的语法。以下是勇气号的方法:

op         = char_("-+/*");
name       = +(graph - '='); // excluding `op` is not even necessary here
simple     = +(graph - op);
expression = raw [
             '(' >> expression >> ')'
            | simple >> *(op >> expression)
            ];
value      = expression;
definition = name >> - ('=' > value);
start      = *definition;

查看Live On Coliru

raw[]在那里,因此我们可以忽略整个表达式结构,以便进行标记化/验证。除了操作符字符,我接受所有非空格的名称。

使用方式:

int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It first(input.begin()), last(input.end());
    Definitions defs;
    if (qi::phrase_parse(first, last, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitionsn";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'n";
            else
                std::cout << " with no value expressionn";
        }
    } else
    {
        std::cout << "Parse failedn";
    }
    if (first != last)
        std::cout << "Remaining unparsed input: '" << std::string(first,last) << "'n";
}

打印:

Parsed 5 definitions
p1 with no value expression
p2 with value expression '(v21 +  v22)'
p3 with value expression 'v31-v32'
p4 with no value expression
p5 with value expression 'v5'

完整代码供参考

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct Definition {
    std::string name;
    boost::optional<std::string> value;
};
BOOST_FUSION_ADAPT_STRUCT(Definition, (std::string, name)(boost::optional<std::string>, value))
using Definitions = std::vector<Definition>;
template <typename Iterator, typename Skipper = qi::space_type>
struct grammar : qi::grammar<Iterator, Definitions(), Skipper>
{
    grammar() : grammar::base_type(start) {
        using namespace qi;
        name       = +(graph - '=');
        simple     = name;
        expression = raw [
                '(' >> expression >> ')'
              | simple >> *(char_("+-/*") >> expression)
              ];
        value      = expression;
        definition = name >> - ('=' > value);
        start      = *definition;
    }
  private:
    qi::rule<Iterator> simple;
    qi::rule<Iterator, std::string(), Skipper> expression, value;
    qi::rule<Iterator, std::string()/*no skipper*/> name;
    qi::rule<Iterator, Definition(),  Skipper> definition;
    qi::rule<Iterator, Definitions(), Skipper> start;
};
int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It f(input.begin()), l(input.end());
    Definitions defs;
    if (qi::phrase_parse(f, l, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitionsn";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'n";
            else
                std::cout << " with no value expressionn";
        }
    } else
    {
        std::cout << "Parse failedn";
    }
    if (f != l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'n";
}

我想我找到解决问题的办法了。和同事一起工作

主要思想包含在下面的例子中:http://regexr.com/38tjv

正则表达式:

(?:^|s)(b[a-zA-Z0-9]+b|b[a-zA-Z0-9]+bs*=s*b[a-zA-Z0-9s+()]+?b)(?=s+b[a-zA-Z0-9]+bs*=|s*$|s+b[a-zA-Z0-9]+b)

解释如下:

    static const std::string kParameterRegEx = "(?:^|\s)"                                                  // starts string or space before, not catched
        + "("                                                                                               // group of the parameter or parameter-value
            + "\b" + kSuportedNamesCharsRegEx + "\b"                                                      //      simple names
            + "|"                                                                                           //      or
            + "\b" + kSuportedNamesCharsRegEx + "\b\s*=\s*\b" + kSuportedValuesCharsRegEx + "?\b"     //      name-value
        + ")"                                                                                               // end group
        + "(?="                                                                                             // followed by group of
            + "\s+\b" + kSuportedNamesCharsRegEx + "\b\s*="                                             //      new parameter with value
            + "|"                                                                                           //      or
            + "\s*$"                                                                                       //      end of string
            + "\s+\b" + kSuportedNamesCharsRegEx + "\b"                                                  //      new parameter without value
        + ")";                                                                                              // end of following group

我希望它对其他需要解析Cadence Spectre电路的人有所帮助。