如何使用qi解析和验证整数的有序列表

how to parse and verify an ordered list of integers using qi

本文关键字：整数列表验证何使用 qi 更新时间：2023-10-16

我正在分析一个文本文件，可能有几个GB大小，由以下行组成：

11 0.1
14 0.78
532 -3.5

基本上，每行一个int和一个float。int应该是有序的且非负的。我想验证数据是否如所述，并已返回该范围内的最小值和最大值int。这就是我想到的：

#include <iostream>
#include <string>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/std_pair.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
namespace my_parsers
{
using namespace qi;
using px::at_c;
using px::val;
template <typename Iterator>
struct verify_data : grammar<Iterator, locals<int>, std::pair<int, int>()>
{
    verify_data() : verify_data::base_type(section)
    {
        section
            =  line(val(0))    [ at_c<0>(_val) = _1]
            >> +line(_a)       [ _a = _1]
            >> eps             [ at_c<1>(_val) = _a]
            ;
        line
            %= (int_ >> other) [
                                   if_(_r1 >= _1)
                                   [
                                       std::cout << _r1 << " and "
                                       << _1 << val(" out of ordern")
                                   ]
                               ]
            ;
        other
            = omit[(lit(' ') | 't') >> float_ >> eol];
    }
    rule<Iterator, locals<int>, std::pair<int, int>() > section;
    rule<Iterator, int(int)> line;
    rule<Iterator> other;
};
}
using namespace std;
int main(int argc, char** argv)
{
    string input("11 0.1n"
                 "14 0.78n"
                 "532 -3.6n");
    my_parsers::verify_data<string::iterator> verifier;
    pair<int, int> p;
    std::string::iterator begin(input.begin()), end(input.end());
    cout << "parse result: " << boolalpha
         << qi::parse(begin, end, verifier, p) << endl; 
    cout << "p.first: " << p.first << "np.second: " << p.second << endl;
    return 0;
}

我想知道的是：

有更好的方法吗？我使用了继承和合成的属性、局部变量和一点凤凰伏都教。这太棒了；学习这些工具很好，但我忍不住想，可能有一种更简单的方法可以实现同样的事情：/（在PEG解析器中…）
例如，如果没有局部变量，怎么能做到这一点？

更多信息：我有其他数据格式同时被解析，所以我想保留返回值作为解析器属性。目前，这是一个std：：对，其他数据格式在解析时会公开它们自己的std：对，我想把它们放在std：向量中。

这至少已经短了很多：

降至28 LOC
没有更多的当地人
不再有融合矢量at<>魔法
不再具有继承属性
不再上语法课
不再手动迭代
使用期望点（参见other）来增强解析错误报告
如果您选择用%=分配该解析器表达式，则该解析器表达式可以巧妙地合成为vector<int>（但除了可能分配更大的数组外，它还会降低性能）

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int main(int argc, char** argv)
{
    std::string input("11 0.1n"
            "14 0.78n"
            "532 -3.6n");
    int min=-1, max=0;
    {
        using namespace qi;
        using px::val;
        using px::ref;
        It begin(input.begin()), end(input.end());
        rule<It> index = int_ 
            [
                if_(ref(max) < _1)  [ ref(max) = _1 ] .else_ [ std::cout << _1 << val(" out of ordern") ],
                if_(ref(min) <  0)  [ ref(min) = _1 ]
            ] ;
        rule<It> other = char_(" t") > float_ > eol;
        std::cout << "parse result: " << std::boolalpha 
                  << qi::parse(begin, end, index % other) << std::endl; 
    }
    std::cout << "min: " << min << "nmax: " << max << std::endl;
    return 0;
}

奖金

我可能建议去掉表达式中的验证，使其成为一个独立的函数；当然，这会使事情变得更加冗长（而且…清晰），我的大脑死亡样本使用了全局变量…——但我相信你知道如何使用boost::bind或px::bind使其更真实

除了上述

即使具有自由功能，也可降至27 LOC
没有凤凰，就没有凤凰
调试构建中不再有phoenix表达式类型使二进制文件膨胀并减慢速度
不再有var、ref、if_、.else_和可怜的operator,（由于phoenix.hpp中未包含过载，它们有重大错误风险（在某个时间））
（易于移植到c++0x lambda-立即消除了对全局变量的需要）

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
namespace px = boost::phoenix;
namespace qi = boost::spirit::qi;
typedef std::string::iterator It;
int min=-1, max=0, linenumber=0;
void validate_index(int index)
{
    linenumber++;
    if (min < 0)     min = index;
    if (max < index) max = index;
    else             std::cout << index << " out of order at line " << linenumber << std::endl;
}
int main(int argc, char** argv)
{
    std::string input("11 0.1n"
            "14 0.78n"
            "532 -3.6n");
    It begin(input.begin()), end(input.end());
    {
        using namespace qi;
        rule<It> index = int_ [ validate_index ] ;
        rule<It> other = char_(" t") > float_ > eol;
        std::cout << "parse result: " << std::boolalpha 
                  << qi::parse(begin, end, index % other) << std::endl; 
    }
    std::cout << "min: " << min << "nmax: " << max << std::endl;
    return 0;
}

我想一个简单得多的方法是使用标准流操作解析文件，然后在循环中检查排序。首先，输入：

typedef std::pair<int, float> value_pair;
bool greater(const value_pair & left, const value_pair & right) {
    return left.first > right.first;
}
std::istream & operator>>(std::istream & stream, value_pair & value) {
    stream >> value.first >> value.second;
    return stream;
}

像这样使用它：

std::ifstream file("your_file.txt");
std::istream_iterator<value_pair> it(file);
std::istream_iterator<value_pair> eof;
if(std::adjacent_find(it, eof, greater) != eof) {
    std::cout << "The values are not ordered" << std::endl;
}

我觉得这简单多了。