RE2 不匹配非 ASCII 字符

RE2 Not matching non-ascii characters

本文关键字:字符 ASCII 不匹配 RE2      更新时间:2023-10-16

我无法让 RE2 使用其十六进制/八进制表示形式匹配字节(不是 ascii(。

下面的代码片段解释了这个问题:

char *test = "abc""xe2""xyz";
std::string str(test); // "abc342xyz" . 342 is octal for xe2
// str.size() == 7
re2::StringPiece string_piece(str); // size is 7, as expected
std::string out;
// extracts the letter 'z' into 'out;. 172 is the octal for z
bool match = re2::RE2::PartialMatch(string_piece, ("(172)"), &out); // match = true, out = 'z'.
// should extract the character 342...but it doesn't.
match = re2::RE2::PartialMatch(string_piece, ("(342)"), &out); // match = false

将编码设置为 latin-1 - RE2 默认值为 UTF-8

match = re2::RE2::P artialMatch(string_piece, re2::RE2("(\342(", re2::RE2::Latin1(, 和出(;