迭代器(字符串::迭代器)的行为好像超出了作用域

Iterator (string::iterator) behaving as if out of scope?

本文关键字:迭代器 作用域 字符串      更新时间:2023-10-16

我将PDF文件复制/粘贴到TXT中作为输入,我想构建一个"Section"树。每一节都包含一个标题(不包括"3.3方法评估")和一个文本(在下一个标题之前的所有内容)。两者都是用迭代器_range实现的(我将其类型定义为string_range)。

首先,我在其他地方有一个函数,它会返回标题号+它后面的第一个单词(在前面的例子中,它将返回"3.3 Evaluation",并将其他所有内容都放在文本下)。此函数可展开标题。

它所做的就是在小节文本中使用第一个句号,并将标题扩展到文本中最后一个大写单词之前,同时相应地缩小文本。

while循环是为了让我到达最后一个循环。如果进行了调试,那么它在循环内部可以完美地工作。一旦我离开它,迭代器就完蛋了。我不明白为什么。

你可以试着自己运行代码,因为我已经去掉了它的所有其他依赖性——它的工作原理和运行方式是一样的。

字符串temp一定是原因,因为它是while循环范围内唯一的东西,但它没有任何意义,因为我正在将它复制到另一个变量中,这是迭代器看到的唯一东西。另一个变量没有超出范围,那么为什么迭代器会发生变化呢?我想不出一个解释:-(

这并不是一件好事,因为std::string::iterator unexplainable也在做同样的事情——类迭代器_range与这种行为无关。。。

#include <string>
#include <boost/algorithm/string.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <boost/regex.h>

using namespace std;
using namespace boost;
typedef iterator_range<string::iterator> string_range;
int main() {
    string original_text("Mixed Initiative Dialogue Management 2.1 Motivation In naturally occurring human-human dialogues, speakers often adopt different dialogue strategies based on hearer characteristics, dialogue history, etc.For instance, the speaker may provide more guidance if the hearer is hav- ing difficulty making progress toward task completion, while taking a more passive approach when the hearer is an expert in the domain.Our main goal is to enable a spoken dialogue system to simulate such human be- havior by dynamically adapting dialogue strategies dur- ing an interaction based on information that can be au- tomatically detected from the dialogue. Figure 1 shows an excerpt from a dialogue between MIMIC and an ac- tual user where the user is attempting to find the times at which the movie Analyze This playing at theaters in Montclair. S and U indicate system and user utterances, respectively, and the italicized utterances are the output of our automatic speech recognizer.In addition, each system turn is annotated with its task and dialogue ini- tiative holders, where task initiative tracks the lead in the process toward achieving the dialogue participants' do- main goal, while dialogue initiative models the lead in determining the current discourse focus (Chu-Carroll and Brown, 1998). In our information query application do- main, the system has task (and thus dialogue) initiative if its utterances provide helpful guidance toward achieving the user's domain goal, as in utterances (6) and (7) where MIMIC provided valid response choices to its query in- tending to solicit a theater name, while the system has 97 dialogue but not task initiative if its utterances only spec- ify the current discourse goal, as in utterance (4). This dialogue illustrates several features of our adap- tive mixed initiative dialogue manager for dynamic");
    string_range text(original_text.begin(), original_text.end() );
    string first_sentence("Mixed Initiative Dialogue Management 2.1 Motivation In naturally occurring human-human dialogues, speakers often adopt different dialogue strategies based on hearer characteristics, dialogue history, etc.");
    regex capex("((^| )([A-Z][a-z]+|[A-Z]+) )"); // Capitalized word (or fullcapsed word)
    string_range capitalized_word;
    string::iterator unexplainable;
    int count = 0;
    while (find_regex(first_sentence, capex) ) { // Getting the last one
        capitalized_word = find_regex(first_sentence, capex);
        string temp(capitalized_word.end(), first_sentence.end() );
        first_sentence = temp;
        unexplainable = capitalized_word.begin(); // Here is fine
        count++;
    }
    if (count <= 1) return 0;
    string_range new_text_range(unexplainable, text.end()); // Here it gets full of junk... why??
    string new_string(new_text_range.begin(), new_text_range.end() );
    string_range new_text_range2(capitalized_word.begin(), text.end());
    return 0;
}

您的问题是将不同序列的迭代器混合在一起,并尝试从中创建一个新序列。unexplainable迭代器指向first_sentence字符串中的某个位置,而text.end()指向original_text字符串的末尾。

以下是的内存外观

      0123456789012345
      ----------------
   00 Hello World!0%&(
   16 %£$!*Bye world!0

现在假设unexplainable指向6,即"World!",text.end()指向31,现在如果您创建一个范围(然后从该范围中创建一个字符串),您将得到垃圾,因为您将得到的字符串将如下所示:"World!0%&(%£$!*Bye world!"。这只是一个虚构的例子,但我希望你能明白:不要混合来自不同序列的迭代器

我将免费提供另一个提示:不要计算find_regex()两次,将循环更改为类似的内容

do
{
  capitalized_word = find_regex(first_sentence, capex);
  if(capitalized_word)
  {
    // do stuff
  }
}while(capitalized_word);