使用Boost Spirit解析文本文件,同时跳过它的大部分

Using Boost Spirit to parse a text file while skipping large parts of it

本文关键字:大部分 Spirit Boost 文件 文本 使用      更新时间:2023-10-16

我有以下std::string:

<lots of text not including "label A" or "label B">    
label A: 34
<lots of text not including "label A" or "label B">
label B: 45
<lots of text not including "label A" or "label B">
...

我想在所有label Alabel B出现后提取单个整数,并将它们放在相应的vector<int> a, b中。一种简单但不优雅的方法是使用find("label A")find("label B"),先解析哪个就解析哪个。是否有一种简洁的方式来表达它?如何跳过label Alabel B之外的所有内容?

你可以直接

omit [ eol >> *char_ - ("nlabel A:") ] >> eol

示例:Live On Coliru

在仓库中也有seek[]指令。以下代码相当于上面的代码:

 repo::seek [ eol >> &lit("int main") ] 

下面是解析原始示例的示例:

*repo::seek [ eol >> "label" >> char_("A-Z") >> ':' >> int_ ],

这将解析为std::vector<std::pair<char, int> >,不需要任何其他内容。

On Coliru Too:

#if 0
<lots of text not including "label A" or "label B">    
label A: 34
<lots of text not including "label A" or "label B">
label B: 45
<lots of text not including "label A" or "label B">
...
#endif
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <fstream>
namespace qi   = boost::spirit::qi;
namespace repo = boost::spirit::repository::qi;
int main()
{
    std::ifstream ifs("main.cpp");
    ifs >> std::noskipws;
    boost::spirit::istream_iterator f(ifs), l;
    std::vector<std::pair<char, int> > parsed;
    using namespace qi;
    bool ok = phrase_parse(
            f, l, 
            *repo::seek [ eol >> "label" >> char_("A-Z") >> ':' >> int_ ],
            blank,
            parsed
        );
    if (ok)
    {
        std::cout << "Found:n";
        for (auto& p : parsed)
            std::cout << "'" << p.first << "' has value " << p.second << "n";
    }
    else
        std::cout << "Fail at: '" << std::string(f,l) << "'n";
}

指出:

  • seek确实暴露了匹配的属性,这是非常强大的:

    repo::seek [ eol >> "label" >> char_("ABCD") >> ':' ] 
    

    将"吃掉"标签,但将标签字母('A', 'B', 'C''D')暴露为属性。

  • 跳过时的性能可能会非常令人惊讶,请阅读文档中的警告http://www.boost.org/doc/libs/1_55_0/libs/spirit/repository/doc/html/spirit_repository/qi_components/directives/seek.html

输出
Found:
'A' has value 34
'B' has value 45