使用 boost::spirit 读取空值

Read empty values with boost::spirit

本文关键字:空值 读取 spirit 使用 boost      更新时间:2023-10-16

我想将CSV读入结构体:

struct data 
{
   std::string a;
   std::string b;
   std::string c;
}

但是,我甚至想读取空字符串以确保所有值都在正确的位置。我将结构调整为boost::fusion,因此以下工作:

// Our parser (using a custom skipper to skip comments and empty lines )
template <typename Iterator, typename skipper = comment_skipper<Iterator> >
  struct google_parser : qi::grammar<Iterator, addressbook(), skipper>
{
  google_parser() : google_parser::base_type(contacts, "contacts")
  {
    using qi::eol;
    using qi::eps;
    using qi::_1;
    using qi::_val;
    using qi::repeat;
    using standard_wide::char_;
    using phoenix::at_c;
    using phoenix::val;
    value = *(char_ - ',' - eol) [_val += _1];
    // This works but only for small structs
    entry %= value >> ',' >> value >> ',' >> value >> eol;
  }
  qi::rule<Iterator, std::string()> value;
  qi::rule<Iterator, data()> entry;
};

不幸的是,repeat将所有非空值存储在向量中,因此属性的值可以混合在一起(即,如果b字段为空,则可能包含来自c的内容):

    entry %= repeat(2)[ value >> ','] >> value >> eol;

我想使用类似于repeat的简短规则,因为我的结构在实践中有 60 个属性!编写 60 条规则不仅乏味,而且似乎 Boost 不喜欢长规则......

你只需要确保你也解析了"空"字符串的值。

value = +(char_ - ',' - eol) | attr("(unspecified)");
entry = value >> ',' >> value >> ',' >> value >> eol;

观看演示:

住在科里鲁

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct data {
    std::string a;
    std::string b;
    std::string c;
};
BOOST_FUSION_ADAPT_STRUCT(data, (std::string, a)(std::string, b)(std::string, c))
template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
    google_parser() : google_parser::base_type(entry, "contacts") {
        using namespace qi;
        value = +(char_ - ',' - eol) | attr("(unspecified)");
        entry = value >> ',' >> value >> ',' >> value >> eol;
        BOOST_SPIRIT_DEBUG_NODES((value)(entry))
    }
  private:
    qi::rule<Iterator, std::string()> value;
    qi::rule<Iterator, data(), skipper> entry;
};
int main() {
    using It = std::string::const_iterator;
    google_parser<It> p;
    for (std::string input : { 
            "something, awful, isn",
            "fine,,justn",
            "like something missing: ,,n",
        })
    {
        It f = input.begin(), l = input.end();
        data parsed;
        bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
        if (ok)
            std::cout << "Parsed: '" << parsed.a << "', '" << parsed.b << "', '" << parsed.c << "'n";
        else
            std::cout << "Parse failedn";
        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'n";
    }
}

指纹:

Parsed: 'something', 'awful', 'is'
Parsed: 'fine', '(unspecified)', 'just'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'

但是,您有一个更大的问题。qi::repeat(2) [ value ]将解析为 2 个字符串的假设是行不通的。

repeat,像operator*一样,operator+operator%解析为容器属性。在这种情况下,容器属性(字符串)也将接收来自第二个value的输入:

住在科里鲁

Parsed: 'somethingawful', 'is', ''
Parsed: 'fine(unspecified)', 'just', ''
Parsed: 'like something missing: (unspecified)', '(unspecified)', ''

由于这不是您想要的,请重新考虑您的数据类型:

  • 要么不调整结构,而是编写自定义特征来分配字段(请参阅 http://www.boost.org/doc/libs/1_57_0/libs/spirit/doc/html/spirit/advanced/customize.html)

  • 更改结构以包含 std::string 向量以匹配公开的属性

  • 或创建自动解析器生成器:

auto_方法:

如果你教Qi如何提取单个值,你可以使用一个简单的规则,比如

entry = skip(skipper() | ',') [auto_] >> eol;

这样,Spirit本身将为给定的融合序列生成正确数量的价值提取!

这是一个快速而肮脏的方法:

CAVEAT 像这样直接专门用于std::string可能不是最好的主意(它可能并不总是合适的,并且可能与其他解析器交互不良)。但是,默认情况下没有定义create_parser<std::string>(因为,它会做什么?),所以我抓住了这个演示的机会:

namespace boost { namespace spirit { namespace traits {
    template <> struct create_parser<std::string> {
        typedef proto::result_of::deep_copy<
            BOOST_TYPEOF(
                qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
            )
        >::type type;
        static type call() {
            return proto::deep_copy(
                qi::lexeme [+(qi::char_ - ',' - qi::eol)] | qi::attr("(unspecified)")
            );
        }
    };
}}}

同样,请参阅演示输出:

住在科里鲁

Parsed: 'something', 'awful', 'is'
Parsed: 'fine', 'just', '(unspecified)'
Parsed: 'like something missing: ', '(unspecified)', '(unspecified)'

注意 有一些高级巫术可以让船长"恰到好处"地工作(见skip()[]lexeme[])。一些一般解释可以在这里找到: 提升精神船长问题

更新

容器方法

这其中有一种微妙之处。实际上有两个。所以这里有一个演示:

住在科里鲁

//#define BOOST_SPIRIT_DEBUG
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
struct data {
    std::vector<std::string> parts;
};
BOOST_FUSION_ADAPT_STRUCT(data, (std::vector<std::string>, parts))
template <typename Iterator, typename skipper = qi::blank_type>
struct google_parser : qi::grammar<Iterator, data(), skipper> {
    google_parser() : google_parser::base_type(entry, "contacts") {
        using namespace qi;
        qi::as<std::vector<std::string> > strings;
        value = +(char_ - ',' - eol) | attr("(unspecified)");
        entry = strings [ repeat(2) [ value >> ',' ] >> value ] >> eol;
        BOOST_SPIRIT_DEBUG_NODES((value)(entry))
    }
  private:
    qi::rule<Iterator, std::string()> value;
    qi::rule<Iterator, data(), skipper> entry;
};
int main() {
    using It = std::string::const_iterator;
    google_parser<It> p;
    for (std::string input : { 
            "something, awful, isn",
            "fine,,justn",
            "like something missing: ,,n",
        })
    {
        It f = input.begin(), l = input.end();
        data parsed;
        bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
        if (ok) {
            std::cout << "Parsed: ";
            for (auto& part : parsed.parts) 
                std::cout << "'" << part << "' ";
            std::cout << "n";
        }
        else
            std::cout << "Parse failedn";
        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'n";
    }
}

微妙之处在于:

  • 通过自动属性处理适应单元素序列会遇到边缘情况:单成员结构体的灵气属性传播问题
  • 在这种特殊情况下,Spirit需要手把手来处理repeat[...]>>value,将其视为合成单个容器/原子/。 as<T>指令在这里解决了这个问题