是否有更好的方法来处理缓冲区和阅读中的不完整数据

Is there a better way to handle incomplete data in a buffer and reading?

本文关键字:整数 数据 缓冲区 更好 方法 是否 处理      更新时间:2023-10-16

我正在处理由事件构建的二进制文件。每个事件都可以具有可变长度。由于我的读取缓冲区是固定尺寸的,因此我处理的事情如下:

const int bufferSize = 0x500000;
const int readSize = 0x400000;
const int eventLengthMask = 0x7FFE0000;
const int eventLengthShift = 17;
const int headerLengthMask = 0x1F000;
const int headerLengthShift = 12;
const int slotMask = 0xF0;
const int slotShift = 4;
const int channelMask = 0xF;
...
//allocate the buffer we allocate 5 MB even though we read in 4MB chunks
//to deal with unprocessed data from the end of a read
char* allocBuff = new char[bufferSize]; //inFile reads data into here
unsigned int* buff = reinterpret_cast<unsigned int*>(allocBuff); //data is interpretted from here
inFile.open(fileName.c_str(),ios_base::in | ios_base::binary);
int startPos = 0;
while(!inFile.eof())
{
    int index = 0;
    inFile.read(&(allocBuff[startPos]), readSize);
    int size = ((readSize + startPos)>>2);
    //loop to process the buffer
    while (index<size)
    {
        unsigned int data = buff[index];
        int eventLength = ((data&eventLengthMask)>>eventLengthShift);
        int headerLength = ((data&headerLengthMask)>>headerLengthShift);
        int slot = ((data&slotMask)>>slotShift);
        int channel = data&channelMask;
        //now check if the full event is in the buffer
        if( (index+eventLength) > size )
        {//the full event is not in the buffer
            break;
        }
        ++index;
        //further processing of the event
    }
    //move the data at the end of the buffer to the beginning and set start position
    //for the next read
    for(int i = index; i<size; ++i)
    {
        buff[i-index] = buff[i];
    }
    startPos = ((size-index)<<2);
}

我的问题是:在缓冲区末尾使用未加工的数据更好吗?

您可以使用圆形缓冲区而不是简单的数组来改进它。那是阵列上的圆形迭代器。那么,您就无需完成所有复制&mdash;阵列的"启动"移动。

除此之外,不,不是真的。

过去遇到这个问题时,我只是复制了未加工的数据降低,然后从末尾阅读。这是一个有效的解决方案(迄今为止最简单的话)元素很小,缓冲液很大。(在现代机器,"相当小"很容易成为几个一百kb。)当然,您必须跟踪多少您已经复制了,以调整指针和大小接下来阅读。

除此之外:

  • 您最好使用std::vector<char>作为缓冲区。
  • 您不能将四个字节从磁盘读取到一个 unsigned int仅通过施放其地址;你必须插入每个字节中的每个字节都属于unsigned int
  • 最后:您不检查阅读成功处理数据之前。使用未掩盖的输入与 istream有点棘手:您的循环可能应该是就像是 while ( inFile.read( addr, len ) || inFile.gcount() != 0 )...