从 std::vector 接管内存

taking over memory from std::vector

本文关键字：内存 vector std 更新时间：2023-10-16

我使用一个外部库来处理大量数据。数据由原始指针加上长度传入。该库不声明指针的所有权，但在处理完数据时调用提供的回调函数（具有相同的两个参数）。

通过使用std::vector<T>可以方便地准备数据，我宁愿不放弃这种便利。复制数据完全是不可能的。因此，我需要一种方法来"接管"std::vector<T>拥有的内存缓冲区，并（稍后）在回调中释放它。

我当前的解决方案如下所示：

std::vector<T> input = prepare_input();
T * data = input.data();
size_t size = input.size();
// move the vector to "raw" storage, to prevent deallocation
alignas(std::vector<T>) char temp[sizeof(std::vector<T>)];
new (temp) std::vector<T>(std::move(input));
// invoke the library
lib::startProcesing(data, size);

并且，在回调函数中：

void callback(T * data, size_t size) {
    std::allocator<T>().deallocate(data, size);
}

此解决方案有效，因为标准分配器的 deallocate 函数忽略其第二个参数（元素计数）并简单地调用 ::operator delete(data) 。否则，可能会发生不好的事情，因为输入向量的size可能比其capacity小很多。

我的问题是：是否有一种可靠的（C++标准）方法来接管std::vector的缓冲区并在以后的某个时间"手动"释放它？

你不能从向量中获取内存的所有权，但你可以用另一种方式解决你的潜在问题。

以下是我处理它的方法 - 由于静态全局变量而不是线程安全，它有点笨拙，但它可以通过一些简单的锁定来访问registry对象来实现。

static std::map<T*, std::vector<T>*> registry;
void my_startProcessing(std::vector<T> * data) {
  registry.put(data->data(), data);
  lib::startProcesing(data->data(), data->size());
}
void my_callback(T * data, size_t length) {
  std::vector<T> * original = registry.get(data);
  delete original;
  registry.remove(data);
}

现在你可以做

std::vector<T> * input = ...
my_startProcessing(input);

但要小心！如果在调用my_startProcessing后向输入添加/删除元素，则会发生不好的事情 - 库的缓冲区可能会失效。（您可能被允许更改向量中的值，因为我相信这将正确写入 to 数据，但这也取决于库允许的内容。

如果 T = bool，这也不起作用，因为std::vector<bool>::data()不起作用。

您可以在向量上创建自定义类构建。

这里的关键点是在构造函数中使用移动语义SomeData。

您无需复制即可获得准备好的数据（请注意，源向量将被清除）
数据将由thisData矢量析构函数正确处理
源载体可以毫无问题地处理

由于底层数据类型将是数组，因此您可以计算开始指针和数据大小（请参阅下面的SomeDataImpl.h）：

一些数据.h

#pragma once
#include <vector>
template<typename T>
class SomeData
{
    std::vector<T> thisData;
public:
    SomeData(std::vector<T> && other);
    const T* Start() const;
    size_t Size() const;
};
#include "SomeDataImpl.h"

SomeDataImpl.h

#pragma once
template<typename T>
SomeData<T>::SomeData(std::vector<T> && otherData) : thisData(std::move(otherData)) { }
template<typename T>
const T* SomeData<T>::Start() const {
    return thisData.data();
}
template<typename T>
size_t SomeData<T>::Size() const {
    return sizeof(T) * thisData.size();
}

使用示例：

#include <iostream>
#include "SomeData.h"
template<typename T>
void Print(const T * start, size_t size) {
    size_t toPrint = size / sizeof(T);
    size_t printed = 0;
    while(printed < toPrint) {
        std::cout << *(start + printed) << ", " << start + printed << std::endl;
        ++printed;
    }
}
int main () {
    std::vector<int> ints;
    ints.push_back(1);
    ints.push_back(2);
    ints.push_back(3);
    SomeData<int> someData(std::move(ints));
    Print<int>(someData.Start(), someData.Size());
  return 0;
}

你不能

以任何可移植的方式做到这一点，但你可以以一种可能适用于大多数C++实现的方式做到这一点。在VS 2017上进行快速测试后，此代码似乎可以工作。

#include <iostream>
#include <vector>
using namespace std;
template <typename T>
T* HACK_stealVectorMemory(vector<T>&& toStealFrom)
{
    // Get a pointer to the vector's memory allocation
    T* vectorMemory = &toStealFrom[0];
    // Construct an empty vector in some stack memory using placement new
    unsigned char buffer[sizeof(vector<T>)];
    vector<T>* fakeVector = new (&buffer) vector<T>();
    // Move the memory pointer from toCopy into our fakeVector, which will never be destroyed.
    (*fakeVector) = std::move(toStealFrom);
    return vectorMemory;
}
int main()
{
    vector<int> someInts = { 1, 2, 3, 4 };
    cout << someInts.size() << endl;
    int* intsPtr = HACK_stealVectorMemory(std::move(someInts));
    cout << someInts.size() << endl;
    cout << intsPtr[0] << ", " << intsPtr[3] << endl;
    delete intsPtr;
}

输出：

4
0
1, 4