避免从字符串流中获取任何内容

Avoid grabbing nothing from string stream

本文关键字：获取任何内字符串更新时间：2023-10-16

我正在为一个非常基本的ISA编写汇编程序。目前，我正在实现解析器函数，并使用字符串流从行中获取单词。以下是汇编代码的示例：

; This program counts from 10 to 0
        .ORIG x3000
        LEA R0, TEN     ; This instruction will be loaded into memory location x3000
        LDW R1, R0, #0
START   ADD R1, R1, #-1
        BRZ DONE
        BR  START
                        ; blank line
DONE    TRAP    x25     ; The last executable instruction
TEN     .FILL   x000A   ; This is 10 in 2's comp, hexadecimal
        .END

不要担心汇编代码的性质，只需看看第3行，右边有注释的那一行。我的解析器功能还不完整，但我有以下功能：

// Define three conditions to code
enum {DONE, OK, EMPTY_LINE};
// Tuple containing a condition and a string vector
typedef tuple<int,vector<string>> Code;
// Passed an alias to a string
// Parses the line passed to it
Code ReadAndParse(string& line)
{
    /***********************************************/
    /****************REMOVE COMMENTS****************/
    /***********************************************/
    // Sentinel to flag down position of first
    // semicolon and the index position itself
    bool found = false;
    size_t semicolonIndex = -1;
    // Convert the line to lowercase
    for(int i = 0; i < line.length(); i++)
    {
        line[i] = tolower(line[i]);
        // Find first semicolon
        if(line[i] == ';' && !found)
        {
            semicolonIndex = i;
            // Throw the flag
            found = true;
        }
    }
    // Erase anything to and from semicolon to ignore comments
    if(found != false)
        line.erase(semicolonIndex);

    /***********************************************/
    /*****TEST AND SEE IF THERE'S ANYTHING LEFT*****/
    /***********************************************/
    // To snatch and store words
    Code code;
    string token;
    stringstream ss(line);
    vector<string> words;
    // While the string stream is still of use
    while(ss.good())
    {
        // Send the next string to the token
        ss >> token;
        // Push it onto the words vector
        words.push_back(token);
        // If all we got was nothing, it's an empty line
        if(token == "")
        {
            code = make_tuple(EMPTY_LINE, words);
            return code;
        }
    }
    /***********************************************/
    /***********DETERMINE OUR TYPE OF CODE**********/
    /***********************************************/

    // At this point it should be fine
    code = make_tuple(OK, words);
    return code;
}

如您所见，Code元组包含枚举分离中表示的条件和包含行中所有单词的向量。我想要的是把一行中的每个单词都推到向量中，然后返回。

该问题出现在函数的第三次调用（汇编代码的第三行）上。我使用ss.good（）函数来确定字符串流中是否有单词。出于某种原因，ss.good（）函数返回true，即使第三行中没有第四个单词，并且我最终将单词[lea][r0，][ten]和[ten]推入向量中。ss.good（）在第四次调用中为true，token不接收任何内容，因此我已经两次将[10]推入向量中。

我注意到，如果我删除分号和最后一个单词之间的空格，就不会出现这个错误。我想知道如何将正确数量的单词推入向量中。

请不要推荐Boost库。我喜欢图书馆，但我想保持这个项目的简单。这没什么大不了的，这个处理器只有十几条指令。另外，请记住，这个函数只是半生不熟的，我正在逐步测试和调试它。

只有在条件（如到达流的末尾）发生后，流的错误标志才会设置。

尝试将循环条件替换为：

while(ss >> token)
{
    // Push it onto the words vector
    words.push_back(token);
    // If all we got was nothing, it's an empty line
    if(token == "")
    {
        code = make_tuple(EMPTY_LINE, words);
        return code;
    }
}

有了这个代码，我得到了第3行的以下标记：

"LEA"
"R0,"
"TEN"
";"
"This"
"instruction"
"will"
"be"
"loaded"
"into"
"memory"
"location"
"x3000"

我知道你试图解析的语言很简单。尽管如此，如果你考虑在工作中使用专门的工具，例如flex，你会帮自己一个忙。