对我的文本文件应用tabify会导致我的文件阅读器出现segfault

Applying tabify to my text file causes my file reader to seg fault

本文关键字：我的文件 segfault 应用文本 tabify 更新时间：2023-10-16

我有一个包含以下文本的文件records.txt:

John    Smith   Sales   555-1234
Mary    Jones   Wages   555-9876
Paul    Harris  Accts   555-4321

我已经复制了以下代码从 c++ Programming by Mike McGrath到一个文件format.cpp中读取records.txt中的数据:

#include <fstream>
#include <string>
#include <iostream>
using namespace std;
int main()
{
  const int RANGE = 12;
  string tab[RANGE];
  int i = 0, j = 0;
  ifstream reader("records.txt");
  if (!reader)
    {
      cout << "Error opening input file" << endl;
      return -1;
    }
  while (!reader.eof())
    {
      if ((i + 1) % 4 == 0)
        getline(reader, tab[i++], 'n');
      else
        getline(reader, tab[i++], 't');
    }
  reader.close();
  i = 0;
  while (i < RANGE)
    {
      cout << endl << "Record Number: " << ++j << endl;
      cout << "Forename: " << tab[i++] << endl;
      cout << "Surname: " << tab[i++] << endl;
      cout << "Daprtment: " << tab[i++] << endl;
      cout << "Telephone: " << tab[i++] << endl;
    }
  return 0;
}

现在，在我的.emacs文件中，根据以下命令，我将所有文件中的制表符自动转换为空格:

(setq-default indent-tabs-mode nil)

因此，当我编译并运行format.out时，我得到以下输出:

$ ./format.out 
Record Number: 1
Forename: John    Smith   Sales   555-1234
Mary    Jones   Wages   555-9876
Paul    Harris  Accts   555-4321
Surname: 
Daprtment: 
Telephone: 
Record Number: 2
Forename: 
Surname: 
Daprtment: 
Telephone: 
Record Number: 3
Forename: 
Surname: 
Daprtment: 
Telephone:

这不是我想要的。我想要的是每个制表符分隔的项目打印在其相应的标签之后。

所以我进入emacs并输入以下命令将records.txt中的空格转换为制表符:

M-x tabify

但是现在当我重新运行我的脚本，我得到一个segfault:

$ ./format.out 
Segmentation fault (core dumped)

为什么会这样，我能做些什么来解决它?(或者，如果原因不明显，我可以做些什么来进一步调查?)

我的c++代码似乎有问题，而不是文件本身，因为当我在python中读取records.txt时，我可以看到它如预期的那样:

In [1]: with open('records.txt') as f:
   ...:     x = f.readlines()
   ...:     
In [2]: x
Out[2]: 
['JohntSmithtSalest555-1234n',
 'MarytJonestWagest555-9876n',
 'PaultHarristAcctst555-4321n']

您可以从阅读为什么while (!reader.eof())是错误的，为什么循环条件中的iostream::eof被认为是错误的?看来你复制代码的那本书不是很好。

我希望这是你的segfault的原因，因为不正确的eof检查，你在你的循环中进行了太多的次数，并对你的数组进行了越界访问。您可以通过将数组的大小增加到13来检查这一点。

找一本更好的书(顺便问一下是什么书?)。

下面是读取文件的一种可能的方法(未测试的代码)

for (;;)
{
    char delim;
    if ((i + 1) % 4 == 0)
        delim = 'n';
    else
        delim = 't';
    if (!getline(reader, tab[i++], delim))
        break; // eof or some other error
}