提升精神:如何在使用文本说明符解析双打列表时使用自定义逻辑

Boost spirit: how to use custom logic when parsing a list of doubles with text specifiers

本文关键字：列表自定义文本说明符更新时间：2023-10-16

我想解析一个双精度向量。但是，此向量还可能包含两种类型的语句来稍微压缩数据：FOR和RAMP。

如果FOR在字符串中，则其格式应为"<double> FOR <int>"。这意味着重复<double><int>次。

例如，"1 1.5 2 2.5 3 FOR 4 3.5"应解析为{ 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 }

如果字符串中RAMP，则其格式应为"<double1> RAMP <int> <double2>"。这意味着在<int>时间段内<double1>和<double2>之间的线性插值。

例如，"1 2 3 4 RAMP 3 6 7 8"应解析为{ 1, 2, 3, 4, 5, 6, 7, 8 }

我不知道如何超越为各个元素定义解析器。如何提供自定义代码以在遇到扩展时执行扩展？

谢谢！

最简单的没有语义操作¹的是解析为AST然后进行解释。

更乏味的方法是使用语义操作来构建结果。(请记住，这在回溯语法中会遇到问题。

我做了类似的回答：

使用语义操作解析逗号分隔的范围和数字列表
一种基于正则表达式的范围表达式方法，此处将 a-z 扩展到 abc...XYZ 表单
使用C作为性能方法将a-z扩展到abc的竞争版本...XYZ 表单

事不宜迟：

使用 AST 表示形式

示例 AST：

namespace AST {
using N = unsigned long;
using V = double;
struct repeat { N n; V value; };
struct interpolate {
N n; V start, end;
bool is_valid() const;
};
using element = boost::variant<repeat, interpolate>;
using elements = std::vector<element>;

is_valid是一个很好的地方，我们可以做逻辑断言，比如"周期数不为零"或"如果周期数为 1，则开始和结束必须重合"。

现在，对于我们的最终结果，我们希望将 just-a-vector-of-V 转换为：

using values = std::vector<V>;
static inline values expand(elements const& v) {
struct {
values result;
void operator()(repeat const& e) {
result.insert(result.end(), e.n, e.value);
}
void operator()(interpolate const& e) {
if (!e.is_valid()) {
throw std::runtime_error("bad interpolation");
}
if (e.n>0) { result.push_back(e.start); }
if (e.n>2) {
auto const delta = (e.end-e.start)/(e.n-1);
for (N i=1; i<(e.n-1); ++i)
result.push_back(e.start + i * delta);
}
if (e.n>1) { result.push_back(e.end); }
}
} visitor;
for (auto& el : v) {
boost::apply_visitor(visitor, el);
}
return std::move(visitor.result);
}
}

现在我们已经掌握了基础知识，让我们解析和测试：

解析

首先，让我们调整 AST 类型：

BOOST_FUSION_ADAPT_STRUCT(AST::repeat, value, n)
BOOST_FUSION_ADAPT_STRUCT(AST::interpolate, start, n, end)

注意：适配属性的"自然语法顺序"使属性传播无需语义操作即可轻松

现在让我们滚动语法：

namespace qi = boost::spirit::qi;
template <typename It> struct Grammar : qi::grammar<It, AST::elements()> {
Grammar() : Grammar::base_type(start) {
elements_ = *element_;
element_ = interpolate_ | repeat_;
repeat_
= value_ >> "FOR" >> qi::uint_
| value_ >> qi::attr(1u)
;
interpolate_
= value_ >> "RAMP" >> qi::uint_ >> value_
;
value_ = qi::auto_;
start = qi::skip(qi::space) [ elements_ ];
BOOST_SPIRIT_DEBUG_NODES((start)(elements_)(element_)(repeat_)(interpolate_)(value_))
}
private:
qi::rule<It, AST::elements()> start;
qi::rule<It, AST::elements(),    qi::space_type> elements_;
qi::rule<It, AST::element(),     qi::space_type> element_;
qi::rule<It, AST::repeat(),      qi::space_type> repeat_;
qi::rule<It, AST::interpolate(), qi::space_type> interpolate_;
qi::rule<It, AST::V(),           qi::space_type> value_;
};

注意：
BOOST_SPIRIT_DEBUG_NODES启用规则调试
interpolate_ | repeat_的顺序很重要，因为repeat_也解析单个数字(因此它会防止FROM被及时解析。

一个简单的实用程序，用于调用解析器并expand()中间表示形式：

AST::values do_parse(std::string const& input) {
static const Grammar<std::string::const_iterator> g;
auto f = begin(input), l = end(input);
AST::elements intermediate;
if (!qi::parse(f, l, g >> qi::eoi, intermediate)) {
throw std::runtime_error("bad input");
}
return expand(intermediate);
}

测试

布丁的证据在于吃：

住在科里鲁

int main() {
std::cout << std::boolalpha;
struct { std::string input; AST::values expected; } cases[] = {
{ "1 1.5 2 2.5 3 FOR 4 3.5", { 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 } },
{ "1 2 3 4 RAMP 3 6 7 8", { 1, 2, 3, 4, 5, 6, 7, 8 } },
};
for (auto const& test : cases) {
try {
std::cout << std::quoted(test.input) << " -> ";
auto actual = Parse::do_parse(test.input);
std::cout << (actual==test.expected? "PASSED":"FAILED") << " { ";
// print the actual for reference
std::cout << " {";
for (auto& v : actual) std::cout << v << ", ";
std::cout << "}n";
} catch(std::exception const& e) {
std::cout << "ERROR " << std::quoted(e.what()) << "n";
}
}
}

印刷

"1 1.5 2 2.5 3 FOR 4 3.5" -> PASSED {  {1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5, }
"1 2 3 4 RAMP 3 6 7 8" -> PASSED {  {1, 2, 3, 4, 5, 6, 7, 8, }

改用语义操作

这可能更有效，我发现我实际上更喜欢这种方法的表现力。

不过，随着语法变得更加复杂，它可能无法很好地扩展。

在这里，我们"反转"流程：

Grammar() : Grammar::base_type(start) {
element_ =
qi::double_                          [ px::push_back(qi::_val, qi::_1) ]
| ("FOR" >> qi::uint_)                 [ handle_for(qi::_val, qi::_1) ]
| ("RAMP" >> qi::uint_ >> qi::double_) [ handle_ramp(qi::_val, qi::_1, qi::_2) ]
;
start = qi::skip(qi::space) [ *element_ ];
}

在这里，语义操作中的handle_for和handle_ramp是惰性Actor，它们基本上执行的操作与expand()在基于AST的appraoch中执行的操作相同，但是

在飞行中
第一个操作数是隐式的(它是已经在向量后面的最后一个值(

这样可以进行一些额外的检查(当用户传递以"FOR"或"RAMP"开头的字符串时，我们不希望 UB(：

struct handle_for_f {
void operator()(Values& vec, unsigned n) const {
if (vec.empty() || n<1)
throw std::runtime_error("bad quantifier");
vec.insert(vec.end(), n-1, vec.back());
}
};
struct handle_ramp_f {
void operator()(Values& vec, unsigned n, double target) const {
if (vec.empty())
throw std::runtime_error("bad quantifier");
if ((n == 0) || (n == 1 && (vec.back() != target)))
throw std::runtime_error("bad interpolation");
auto start = vec.back();
if (n>2) {
auto const delta = (target-start)/(n-1);
for (std::size_t i=1; i<(n-1); ++i)
vec.push_back(start + i * delta);
}
if (n>1) { vec.push_back(target); }
}
};

为了避免语义操作中繁琐的boost::phoenix::bind，让我们适应为凤凰函数：

px::function<handle_for_f> handle_for;
px::function<handle_ramp_f> handle_ramp;

解析

do_parse助手变得更简单，因为我们没有中间表示：

Values do_parse(std::string const& input) {
static const Grammar<std::string::const_iterator> g;
auto f = begin(input), l = end(input);
Values values;
if (!qi::parse(f, l, g >> qi::eoi, values)) {
throw std::runtime_error("bad input");
}
return values;
}

测试

同样，布丁的证据在于吃。未经修改的测试程序main()：

住在科里鲁

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
#include <iomanip>
using Values = std::vector<double>;
namespace Parse {
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
template <typename It> struct Grammar : qi::grammar<It, Values()> {
Grammar() : Grammar::base_type(start) {
element_ =
qi::double_                          [ px::push_back(qi::_val, qi::_1) ]
| ("FOR" >> qi::uint_)                 [ handle_for(qi::_val, qi::_1) ]
| ("RAMP" >> qi::uint_ >> qi::double_) [ handle_ramp(qi::_val, qi::_1, qi::_2) ]
;
start = qi::skip(qi::space) [ *element_ ];
}
private:
qi::rule<It, Values()> start;
qi::rule<It, Values(), qi::space_type> element_;
struct handle_for_f {
void operator()(Values& vec, unsigned n) const {
if (vec.empty() || n<1)
throw std::runtime_error("bad quantifier");
vec.insert(vec.end(), n-1, vec.back());
}
};
struct handle_ramp_f {
void operator()(Values& vec, unsigned n, double target) const {
if (vec.empty())
throw std::runtime_error("bad quantifier");
if ((n == 0) || (n == 1 && (vec.back() != target)))
throw std::runtime_error("bad interpolation");
auto start = vec.back();
if (n>2) {
auto const delta = (target-start)/(n-1);
for (std::size_t i=1; i<(n-1); ++i)
vec.push_back(start + i * delta);
}
if (n>1) { vec.push_back(target); }
}
};
px::function<handle_for_f> handle_for;
px::function<handle_ramp_f> handle_ramp;
};
Values do_parse(std::string const& input) {
static const Grammar<std::string::const_iterator> g;
auto f = begin(input), l = end(input);
Values values;
if (!qi::parse(f, l, g >> qi::eoi, values)) {
throw std::runtime_error("bad input");
}
return values;
}
}
int main() {
std::cout << std::boolalpha;
struct { std::string input; Values expected; } cases[] = {
{ "1 1.5 2 2.5 3 FOR 4 3.5", { 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 } },
{ "1 2 3 4 RAMP 3 6 7 8", { 1, 2, 3, 4, 5, 6, 7, 8 } },
};
for (auto const& test : cases) {
try {
std::cout << std::quoted(test.input) << " -> ";
auto actual = Parse::do_parse(test.input);
std::cout << (actual==test.expected? "PASSED":"FAILED") << " { ";
// print the actual for reference
std::cout << " {";
for (auto& v : actual) std::cout << v << ", ";
std::cout << "}n";
} catch(std::exception const& e) {
std::cout << "ERROR " << std::quoted(e.what()) << "n";
}
}
}

打印与以前相同：

"1 1.5 2 2.5 3 FOR 4 3.5" -> PASSED {  {1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5, }
"1 2 3 4 RAMP 3 6 7 8" -> PASSED {  {1, 2, 3, 4, 5, 6, 7, 8, }

¹ 提升精神："语义行为是邪恶的"？

这是我最终去的。它使用语义操作，但它比@sehe可能更正确的答案更简单：不使用模板函数，不使用凤凰，不需要自定义语法结构。

#include <iostream>
#include <string>
#include <vector>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;
std::vector<double> parseVector(const std::string & vec_str)
{
std::vector<double> vec;
auto for_handler = [&vec, &vec_str](const unsigned &len) {
if (len == 0)
throw std::runtime_error("Invalid vector: " + vec_str);
vec.insert(vec.end(), len - 1, vec.back());
};
auto ramp_handler = [&vec, &vec_str](const fusion::vector<unsigned, double> & vals) {
double start = vec.back();
double target = fusion::at_c<1>(vals);
unsigned len = fusion::at_c<0>(vals);
if (len == 0 || (len == 1 && start != target))
throw std::runtime_error("Invalid vector: " + vec_str);
if (len >= 2) {
for (unsigned i = 0; i < len - 2; i++)
vec.push_back(start + (i + 1) * (target - start) / (len - 1));
vec.push_back(target);
}
};
auto double_handler = [&vec](const double &val) {
vec.push_back(val);
};
auto for_rule = qi::no_case[qi::lit("for") | qi::lit('*')] >> qi::uint_;
auto ramp_rule = qi::no_case[qi::lit("ramp")] >> qi::uint_ >> qi::double_;
auto vec_rule = qi::double_[double_handler] >> *(for_rule[for_handler] | ramp_rule[ramp_handler] | qi::double_[double_handler]);
auto it = vec_str.begin();
if (!qi::phrase_parse(it, vec_str.end(), vec_rule, qi::ascii::space) || it != vec_str.end())
throw std::runtime_error("Invalid vector: " + vec_str);
return vec;
}

现在，如果我只能"1 for 4.5 1"抛出错误而不是决心{1 1 1 1 0.5 1 }.叹息。