解析复杂字符串

Parsing a complex string

本文关键字：字符串复杂更新时间：2023-10-16

我需要按照以下顺序读取一个字符串:

读取以空格分隔的任意数量的数字，丢弃除最后一个以外的所有数字，保存到n
读取一个空格后面跟着n字符后面跟着空格，只保存字符
读取另外两个以空格分隔的数字并同时保存

我想使用字符串流读取数字并在字符串处停止，但我不知道如何在字符串流中预测字符串并停止读取数字，而不"读取"字符串作为数字并终止字符串流。
如何预测字符串并停止读取它之前的数字?
有没有更好的方法来解读这整个模式?
我使用c++ 11.

编辑:
示例输入:

1 2 3 4 6 abc de 7 8

除外输出:

The string: 'abc de'
Number 1: 7
Number 2: 8

在我看来有两个选择:要么使用正则表达式，要么使用某种状态机逐个字符地检查输入。

编辑

关于那个状态机…也许像这样:

// Pre-conditions: "str" is a std::string containing the whole string to be parsed
enum class states
{
    GET_LENGTH,           // Looking for the embedded string length
    GET_LENGTH_OR_STRING, // Get the embedded string, or the length
    GET_STRING,           // Getting the embedded string
    GET_NUMBER_1,         // Look for the first number after the string
    GET_NUMBER_2,         // Look for the second number after the string
};
int         len = 0; // Length of the embedded string
std::string tmp;     // Temporary string
int         n1, n2;  // The numbers after the string
states      state = GET_LENGTH;
for (auto ci = str.begin(); ci != str.end(); )
{
    // Skip whitespace
    while (isspace(*ci))
        ci++;
    switch (state)
    {
    case GET_LENGTH:
        while (isdigit(*ci))
            tmp += *ci++;
        len = strtol(tmp.c_str(), nullptr, 10);
        state = GET_LENGTH_OR_STRING;
        break;
    case GET_LENGTH_OR_STRING:
        if (isdigit(*ci))
            state = GET_LENGTH;
        else
            state = GET_STRING;
        break;
    case GET_STRING:
        tmp = std::string(ci, ci + len);
        ci += len;
        tmp = "";
        state = GET_NUMBER_1;
        break;
    case GET_NUMBER_1:
        while (isdigit(*ci))
            tmp += *ci++;
        n1 = strtol(tmp.c_str(), nullptr, 10);
        break;
    case GET_NUMBER_2:
        while (isdigit(*ci))
            tmp += *ci++;
        n2 = strtol(tmp.c_str(), nullptr, 10);
        break;
    }
}

免责声明:这没有经过测试，只是直接在浏览器中"按原样"编写。

代码可能会更好，比如获取长度和尾部数字的状态基本上是相同的，可以放在单独的函数中共享。

您可以不使用任何正则表达式，只需使用标准的c++流功能。下面是一个使用std::cin作为输入流的例子，但是如果你想从字符串中读取，也可以使用字符串流。

#include <iostream>
#include <iomanip>
#include <vector>
int main(int argc, char* const argv[]) {
        int n,tmp;
        /// read integers, discarding all but the last
        while(std::cin >> tmp)
                n = tmp;
        if(std::cin.bad()) {
                std::cout << "bad format 1" << std::endl;
                return -1;
        }
        /// skip whitespaces
        std::cin >> std::ws;
        std::cin.clear();
        /// read a string of 'n' characters
        std::vector<char> buffer(n+1, '');
        if(! std::cin.read(buffer.data(), n) ) {
                std::cout << "bad format 2" << std::endl;
                return -1;
        }
        std::string s(buffer.data());
        /// Read 2 numbers
        int nb1, nb2;
        if(! (std::cin >> nb1 >> nb2)) {
                std::cout << "bad format 3" << std::endl;
                return -1;
        }
        std::cout << "The string: " << s << std::endl;
        std::cout << "Number 1: " << nb1 << std::endl;
        std::cout << "Number 2: " << nb2 << std::endl;
        return 0;
}

我不太懂c++，你懂吗?

解析空格分隔符
查看该列表:
- 当一个数字时，将数字存储在相同的变量
- 存储n个字符(我假设你的意思是那里有一个字符串)
- 存储最后两个数字

由于您使用的是c++ 11编译器，您可能可以在AXE中编写语法:

// input text
std::string txt("1 2 3 4 6 abc de 7 8");
// assume spaces are ' ' and tabs
auto space = axe::r_any(" t");
// create a number rule that stores matched decimal numbers in 'n'
int n = 0;
auto number_rule = axe::r_decimal(n) % +space;
// create a string rule, which stops when reaching 'n' characters
std::string s;
int count = 0;
auto string_rule = space & 
    *(axe::r_any() & axe::r_bool([&](...){ return n > count++; })) >> s;
// tail rule for two decimal values
int n1 = 0, n2 = 0;
auto tail_rule = +space & axe::r_decimal(n1) & +space & axe::r_decimal(n2);
// a rule for entire input text
auto rule = number_rule & string_rule & tail_rule;
// run parser
rule(txt.begin(), txt.end());
// dump results, you should see: n=6, s=abc de, n1=7, n28
std::cout << "nn=" << n << ", s=" << s << ", n1=" << n1 << ", n2" << n2;