从与对象大小不一致的连续固定大小缓冲区解析对象的有效方法

Efficient approaches for parsing objects from consecutive fixed size buffers that don't align with object size

本文关键字：对象有效缓冲区方法连续不一致更新时间：2023-10-16

我正试图在C++中实现一些目标，其中我有一个API从字节数组中读取对象，而我传入的数组被限制为固定大小。在解析出一个完整的对象后，API知道它完成读取的指针位置（当前字节数组中要读取但未完成的下一个对象的开始）。

然后，我只需要将剩余的字节数组与下一个相同的固定大小数组连接起来，并开始在指针位置读取一个新对象，就好像它是新数组的开始一样。

我是C++的新手，我有以下方法可以工作，但看起来相当麻烦和低效。它需要三个向量和大量的清理、保留和插入。我想知道是否有更高效的替代方案，或者至少同样高效，但代码看起来更简洁？我一直在读像stringstream这样的东西，但它们似乎不需要更少的内存拷贝（可能更多，因为我的API需要传入字节数组）。谢谢

std::vector<char> checkBuffer;
std::vector<char> remainingBuffer;
std::vector<char> readBuffer(READ_BUFFER_SIZE);
//loop while I still have stuff to read from input stream
while (in.good()) {
    in.read(readBuffer.data(), READ_BUFFER_SIZE);
    //This is the holding buffer for the API to parse object from
    checkBuffer.clear();
    //concatenate what's remaining in remainingBuffer (initially empty) 
    //with what's newly read from input inside readBuffer
    checkBuffer.reserve(remainingBuffer.size() + readBuffer.size());
    checkBuffer.insert(checkBuffer.end(), remainingBuffer.begin(),
    remainingBuffer.end());
    checkBuffer.insert(checkBuffer.end(), readBuffer.begin(),
    readBuffer.end());
    //Call API here, and I will also get a pointerPosition back as to 
    //where I am inside the buffer when finishing reading the object
    Object parsedObject = parse(checkBuffer, &pointerPosition)
    //Then calculate the size of bytes not read in checkBuffer
    int remainingBufSize = CheckBuffer.size() - pointerPosition;
    remainingBuffer.clear();
    remainingBuffer.reserve(remainingBufSize);
    //Then just copy over whatever is remaining in the checkBuffer into
    //remainingBuffer and make it be used in next iteration
    remainingBuffer.insert(remainingBuffer.end(), 
   &checkBuffer[pointerPosition],&checkBuffer[checkBuffer.size()]);
}

写入append_chunk_into(in,vect)。它在vect的末尾附加一个数据块。它根据需要调整大小。顺便说一句，字符大小的非零内存标准布局结构可能是比char更好的选择。

附加到末尾：

size_t old_size=vect.size();
vect.resize(vect.size()+new_bytes);
in.read(vect.data()+old_size, new_bytes);

或者读取的api是什么。

要进行解析，请将其输入vect.data()。当它结束ptr时，返回的指针。

然后`vect.erase（vect.begin（），vect.begin（）+（ptr-vect.data（）））删除解析的字节。（只有在解析完缓冲区中的所有内容后才能执行此操作，以节省浪费的mem移动）。

一个矢量。它将重用内存，并且永远不会增长到大于读取大小+最大对象1的大小。所以你可以预先预订。

但实际上，通常大部分时间都是io。因此，将优化的重点放在保持数据流畅上。

如果我处于您的位置，我将只保留readBuffer。我会预订READ_BUFFER_SIZE +sizeof(LargestMessage)。解析后，您将返回一个指针，指向api能够在向量中读取的最后一个内容。然后，我会将结束迭代器转换为指针&*readbuffer.end()，并使用它来绑定我们必须复制到向量头的数据。一旦你在向量的头上有了这些数据，你就可以使用相同的数据调用读取剩下的数据，除非你加上剩余的字节数。确实需要某种方法来确定剩余数组中有多少字符，但这不应该是无法克服的。