解析表达式语法中的左因子分解
Left factorisation in Parsing Expression Grammar
我正在为一种允许以下表达式的语言编写语法:
- 形式为
f args
的函数调用(注意:没有括号!) - 形式
a + b
的加法(以及更复杂的表达式,但这不是重点)
例如:
f 42 => f(42)
42 + b => (42 + b)
f 42 + b => f(42 + b)
语法是明确的(每个表达式都可以用一种方式解析),但我不知道如何将此语法写成PEG,因为两个生成可能都以相同的标记id
开头。这是我的错误PEG。我如何重写它以使其有效?
expression ::= call / addition
call ::= id addition*
addition ::= unary
( ('+' unary)
/ ('-' unary) )*
unary ::= primary
/ '(' ( ('+' unary)
/ ('-' unary)
/ expression)
')'
primary ::= number / id
number ::= [1-9]+
id ::= [a-z]+
现在,当这个语法试图解析输入"a + b
"时,它会将"a
"解析为一个没有参数的函数调用,并阻塞"+ b
"。
我上传了一个C++/Boost.Spirit.Qi语法实现,以防有人想玩它。
(注意,unary
可以消除一元运算和加法的歧义:为了调用以负数为参数的函数,需要指定括号,例如f (-1)
。)
根据聊天中的建议,您可以从以下内容开始:
expression = addition | simple;
addition = simple >>
( ('+' > expression)
| ('-' > expression)
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
从那以后,我在C++中用AST演示实现了这一点,所以你可以通过漂亮地打印它来了解这种语法是如何构建表达式树的
所有源代码都在github上:https://gist.github.com/2152518
有两个版本(向下滚动到"Alternative"以阅读更多
语法:
template <typename Iterator>
struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type>
{
qi::rule<Iterator, std::string(), qi::space_type> id;
qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple;
qi::rule<Iterator, number_t(), qi::space_type> number;
qi::rule<Iterator, call_t(), qi::space_type> call;
qi::rule<Iterator, unary_t(), qi::space_type> unary;
mini_grammar() : mini_grammar::base_type(expression)
{
expression = addition | simple;
addition = simple [ qi::_val = qi::_1 ] >>
+(
(qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ]
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
}
};
相应的AST结构是使用非常强大的Boost变体快速而肮脏地定义的
struct addition_t;
struct call_t;
struct unary_t;
typedef double number_t;
typedef boost::variant<
number_t,
boost::recursive_wrapper<call_t>,
boost::recursive_wrapper<unary_t>,
boost::recursive_wrapper<addition_t>
> expression_t;
struct addition_t
{
expression_t lhs;
char binop;
expression_t rhs;
};
struct call_t
{
std::string id;
std::vector<expression_t> args;
};
struct unary_t
{
char unop;
expression_t operand;
};
BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs));
BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args));
BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand));
在完整的代码中,我还重载了运算符<lt;对于这些结构。
完整演示
//#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <iterator>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/optional.hpp>
namespace qi = boost::spirit::qi;
namespace phx= boost::phoenix;
struct addition_t;
struct call_t;
struct unary_t;
typedef double number_t;
typedef boost::variant<
number_t,
boost::recursive_wrapper<call_t>,
boost::recursive_wrapper<unary_t>,
boost::recursive_wrapper<addition_t>
> expression_t;
struct addition_t
{
expression_t lhs;
char binop;
expression_t rhs;
friend std::ostream& operator<<(std::ostream& os, const addition_t& a)
{ return os << "(" << a.lhs << ' ' << a.binop << ' ' << a.rhs << ")"; }
};
struct call_t
{
std::string id;
std::vector<expression_t> args;
friend std::ostream& operator<<(std::ostream& os, const call_t& a)
{ os << a.id << "("; for (auto& e : a.args) os << e << ", "; return os << ")"; }
};
struct unary_t
{
char unop;
expression_t operand;
friend std::ostream& operator<<(std::ostream& os, const unary_t& a)
{ return os << "(" << a.unop << ' ' << a.operand << ")"; }
};
BOOST_FUSION_ADAPT_STRUCT(addition_t, (expression_t, lhs)(char,binop)(expression_t, rhs));
BOOST_FUSION_ADAPT_STRUCT(call_t, (std::string, id)(std::vector<expression_t>, args));
BOOST_FUSION_ADAPT_STRUCT(unary_t, (char, unop)(expression_t, operand));
void append_term(expression_t& lhs, char op, expression_t operand)
{
lhs = addition_t { lhs, op, operand };
}
template <typename Iterator>
struct mini_grammar : qi::grammar<Iterator, expression_t(), qi::space_type>
{
qi::rule<Iterator, std::string(), qi::space_type> id;
qi::rule<Iterator, expression_t(), qi::space_type> addition, expression, simple;
qi::rule<Iterator, number_t(), qi::space_type> number;
qi::rule<Iterator, call_t(), qi::space_type> call;
qi::rule<Iterator, unary_t(), qi::space_type> unary;
mini_grammar() : mini_grammar::base_type(expression)
{
expression = addition | simple;
addition = simple [ qi::_val = qi::_1 ] >>
+(
(qi::char_("+-") > simple) [ phx::bind(&append_term, qi::_val, qi::_1, qi::_2) ]
);
simple = '(' > expression > ')' | call | unary | number;
call = id >> *expression;
unary = qi::char_("-+") > expression;
// terminals
id = qi::lexeme[+qi::char_("a-z")];
number = qi::double_;
BOOST_SPIRIT_DEBUG_NODE(expression);
BOOST_SPIRIT_DEBUG_NODE(call);
BOOST_SPIRIT_DEBUG_NODE(addition);
BOOST_SPIRIT_DEBUG_NODE(simple);
BOOST_SPIRIT_DEBUG_NODE(unary);
BOOST_SPIRIT_DEBUG_NODE(id);
BOOST_SPIRIT_DEBUG_NODE(number);
}
};
std::string read_input(std::istream& stream) {
return std::string(
std::istreambuf_iterator<char>(stream),
std::istreambuf_iterator<char>());
}
int main() {
std::cin.unsetf(std::ios::skipws);
std::string const code = read_input(std::cin);
auto begin = code.begin();
auto end = code.end();
try {
mini_grammar<decltype(end)> grammar;
qi::space_type space;
std::vector<expression_t> script;
bool ok = qi::phrase_parse(begin, end, *(grammar > ';'), space, script);
if (begin!=end)
std::cerr << "Unparsed: '" << std::string(begin,end) << "'n";
std::cout << std::boolalpha << "Success: " << ok << "n";
if (ok)
{
for (auto& expr : script)
std::cout << "AST: " << expr << 'n';
}
}
catch (qi::expectation_failure<decltype(end)> const& ex) {
std::cout << "Failure; parsing stopped after ""
<< std::string(ex.first, ex.last) << ""n";
}
}
备选方案:
我有一个替代版本,可以迭代而不是递归地构建addition_t
,也就是说:
struct term_t
{
char binop;
expression_t rhs;
};
struct addition_t
{
expression_t lhs;
std::vector<term_t> terms;
};
这就不需要使用Phoenix来构建表达式:
addition = simple >> +term;
term = qi::char_("+-") > simple;
相关文章:
- 通过递归进行因子分解
- 有人能分解一下这个c++模板的语法吗
- (C++)分析树以计算返回错误值的简单算术表达式
- 在VS2010-VS2015下编译时,如何使用decltype作为较大类型表达式的LHS
- 提升精神:解析布尔表达式并简化为规范范式
- 不能在初始值设定项列表中将非常量表达式从类型 'int' 缩小到'unsigned long long'
- 使用正则表达式regex_search在字符串中查找字符串
- 如何确认我的constexpr表达式实际上已经在编译时执行
- 概念中的cv限定符需要表达式参数列表
- 为什么constexpr的性能比正常表达式差
- 对于结构,表达式必须是可修改的ivalue
- 当一个值是非常量但用常量表达式初始化时使用constexpr
- 将fold表达式与std::一起用于两个元组
- 断言中的Fold表达式在某些计算机上编译,但在其他计算机上不编译
- 标记 '","' 之前的预期主表达式
- gcc和clang在表达式是否为常量求值的问题上存在分歧
- 如何计算具有指定类型的表达式的相对精度和绝对精度
- 为什么 lambda 表达式的捕获列表无法使用结构化绑定分解
- 解析表达式语法中的左因子分解
- 分解复杂表达式的良好编码习惯是什么?