通过删除C 中的评论，额外的空格和空白行，从文本文件中换成元素

Tokenize elements from a text file by removing comments, extra spaces and blank lines in C++

本文关键字：文本元素文件空白删除评论空格更新时间：2023-10-16

我正在尝试消除文本文件中的注释，空白和额外的空格，然后将剩余的元素归为eken。每个令牌都需要一个之前和之后的空间。

exampleFile.txt
var
/* declare variables */a1 ,
b2a ,     c,

这是到目前为止的工作，

string line; //line: represents one line of text from file
ifstream InputFile("exampleFile", ios::in); //read from exampleFile.txt
//Remove comments
while (InputFile && getline(InputFile, line, ''))
{
    while (line.find("/*") != string::npos)
    {
        size_t Begin = line.find("/*");
        line.erase(Begin, (line.find("*/", Begin) - Begin) + 2);
        // Start at Begin, erase from Begin to where */ is found
    }   
}

这删除了评论，但是在发生这种情况时，我似乎无法找到一种象征化的方法。

所以我的问题是：

是否可以在此处删除评论，空格和空行并将其归为代币？
如何实现一个函数以在每个令牌被令牌化之前添加空间？像C这样的令牌，需要将其识别为C和单独的。

谢谢您的帮助！

如果您需要跳过空格字符，并且不在乎新行，我建议使用operator>>阅读该文件。您可以简单地写：

std::string word;
bool isComment = false;
while(file >> word)
{
    if (isInsideComment(word, isComment))
        continue;
     // do processing of the tokens here
     std::cout << word << std::endl;
}

可以实现辅助功能如下：

bool isInsideComment(std::string &word, bool &isComment)
{
    const std::string tagStart = "/*";
    const std::string tagStop = "*/";
    // match start marker
    if (std::equal(tagStart.rbegin(), tagStart.rend(), word.rbegin())) // ends with tagStart
    {
        isComment = true;
        if (word == tagStart)
            return true;
        word = word.substr(0, word.find(tagStart));
        return false;
    }
    // match end marker
    if (isComment)
    {
        if (std::equal(tagStop.begin(), tagStop.end(), word.begin())) // starts with tagStop
        {
            isComment = false;
            word = word.substr(tagStop.size());
            return false;
        }
        return true;
    }
    return false;
}

为您的示例，这将打印出来：

var
a1
,
b2a
,
c,

如果您有兴趣，上面的逻辑也应处理多行论。

但是，表示应根据您对评论令牌的假设来修改函数实现。例如，它们是否总是与其他words的空格分开？还是有可能解析var1/*comment*/var2表达式？上面的示例在这种情况下不起作用。

因此，另一个选项是（您已经开始实施的）读取行，甚至是文件中的大量数据（以确保开始和结束评论令牌是匹配的），并使用查找或拨号符号的评论标记的学习位置以删除之后他们。