将大文件读取到字符数组的正确方法

Proper way to read big file into char array

本文关键字:方法 数组 字符 文件 读取      更新时间:2023-10-16

我正在使用来自GeeksForGeeks的代码,我想在大输入文件上测试它,所以我摆弄了一下代码以动态分配数组。

我知道我收到此错误:

在抛出 'std::bad_alloc 实例后调用终止 what((: std::bad_alloc'

因为我的内存不足,但我不知道读取大文件的正确方法,以供参考。

这是我正在使用的代码

#include<iostream>
#include<string.h>
#include<fstream>
#include <time.h>
#include <stdio.h>
using namespace std;
// A utility function to find maximum of two integers
int max(int a, int b)
{   return (a > b)? a : b; }
/* Returns length of longest common substring of X[0..m-1]
and Y[0..n-1] */
int LCSubStr(char *&X, char *&Y, int m, int n)
{
// Create a table to store lengths of longest common suffixes of
// substrings.   Notethat LCSuff[i][j] contains length of longest
// common suffix of X[0..i-1] and Y[0..j-1]. The first row and
// first column entries have no logical meaning, they are used only
// for simplicity of program
clock_t t;
t=clock();
int** LCSuff = new int*[m+1];
for(int i = 0; i < m+1; ++i)
LCSuff[i] = new int[n+1];
int result = 0;  // To store length of the longest common substring
/* Following steps build LCSuff[m+1][n+1] in bottom up fashion. */
for (int i=0; i<=m; i++)
{
for (int j=0; j<=n; j++)
{
if (i == 0 || j == 0)
LCSuff[i][j] = 0;
else if (X[i-1] == Y[j-1])
{
LCSuff[i][j] = LCSuff[i-1][j-1] + 1;
result = max(result, LCSuff[i][j]);
}
else LCSuff[i][j] = 0;
}
}
t = clock() - t;
printf("It took me %d clicks (%f seconds).n",t,((float)t)/CLOCKS_PER_SEC);
cout<<"----------------------------"<<endl;
return result;
}
/* Driver program to test above function */
int main()
{
std::ifstream in("F1.txt");
std::string XS((std::istreambuf_iterator<char>(in)),
std::istreambuf_iterator<char>());
std::ifstream inn("F2.txt");
std::string YS((std::istreambuf_iterator<char>(inn)),
std::istreambuf_iterator<char>());
char *X=new char[XS.length()];
char *Y=new char[YS.length()];
XS.copy( X, XS.length() );
YS.copy( Y, YS.length() );
int m = strlen(X);
int n = strlen(Y);
cout << "Length of Longest Common Substring is "
<< LCSubStr(X, Y, m, n)<<endl;
return 0;
}

您需要逐块读取它,而不是将整个文件读入内存。这意味着需要重写算法才能对其状态进行部分更新。

在您的情况下,由于您对XY的访问是顺序的,因此您无需对算法的工作方式进行重大更改即可执行此操作。

在伪代码中:

state = lcs_init();
for (;;) {
chunk = read();
if (eof)
break;
state = lcs_update(state, chunk);
}
result = lcs_finish(state);

通常,在C++(以及一般的 OOP(中,这是使用存储状态的类完成的,即:

LCS lcs;
for (each chunk)
lcs.update(chunk);
result = lcs.result();