多线程C :从内存中读取力，绕过缓存

Multithreaded C++: force read from memory, bypassing cache

本文关键字：缓存读取内存多线程更新时间：2023-10-16

我正在使用个人爱好时间游戏引擎，并且正在研究多线批处理执行者。我最初是在整个地方都使用并发的无锁队列和std ::功能来促进主主线和从属线之间的交流，但决定将其废除，以促进一种更轻巧的方式，以使我能够严格控制记忆的紧张控制分配：功能指针和内存池。

无论如何，我遇到了一个问题：

无论我尝试什么，功能指针都只能通过一个线程正确读取，而其他线程则读取空指针，从而使断言失败。

我很确定这是缓存的问题。我已经确认所有线程在指针上都有相同的地址。我已经尝试将其声明为挥发性，intptr_t，std :: Atomic，并且尝试了各种铸造FU，并且这些线程似乎都忽略了它，并继续阅读他们的缓存副本。

我已经在模型检查器中对主人和从属建模，以确保并发良好，并且没有生计或僵局（前提是共享变量都正确同步）

void Executor::operator() (int me) {
    while (true) {
        printf("Slave %d waiting.n", me);
        {
            std::unique_lock<std::mutex> lock(batch.ready_m);
            while(!batch.running) batch.ready.wait(lock);
            running_threads++;
        }
        printf("Slave %d running.n", me);
        BatchFunc func = batch.func;
        assert(func != nullptr);
        int index;
        if (batch.store_values) {
            while ((index = batch.item.fetch_add(1)) < batch.n_items) {
                void* data = reinterpret_cast<void*>(batch.data_buffer + index * batch.item_size);
                func(batch.share_data, data);
            }
        }
        else {
            while ((index = batch.item.fetch_add(1)) < batch.n_items) {
                void** data = reinterpret_cast<void**>(batch.data_buffer + index * batch.item_size);
                func(batch.share_data, *data);
            }
        }
        // at least one thread finished, so make sure we won't loop back around
        batch.running = false;
        if (running_threads.fetch_sub(1) == 1) { // I am the last one
            batch.done = true; // therefore all threads are done
            batch.complete.notify_all();
        }
    }
}
void Executor::run_batch() {
    assert(!batch.running);
    if (batch.func == nullptr || batch.n_items == 0) return;
    batch.item.store(0);
    batch.running = true;
    batch.done = false;
    batch.ready.notify_all();
    printf("Master waiting.n");
    {
        std::unique_lock<std::mutex> lock(batch.complete_m);
        while (!batch.done) batch.complete.wait(lock);
    }
    printf("Master ready.n");
    batch.func = nullptr;
    batch.n_items = 0;
}

batch.func由另一个函数设置

template<typename SharedT, typename ItemT>
void set_batch_job(void(*func)(const SharedT*, ItemT*), const SharedT& share_data, bool byValue = true) {
    static_assert(sizeof(SharedT) <= SHARED_DATA_MAXSIZE, "Shared data too large");
    static_assert(std::is_pod<SharedT>::value, "Shared data type must be POD");
    assert(std::is_pod<ItemT>::value || !byValue);
    assert(!batch.running);
    batch.func = reinterpret_cast<volatile BatchFunc>(func);
    memcpy(batch.share_data, (void*) &share_data, sizeof(SharedT));
    batch.store_values = byValue;
    if (byValue) {
        batch.item_size = sizeof(ItemT);
    }
    else { // store pointers instead of values
        batch.item_size = sizeof(ItemT*);
    }
    batch.n_items = 0;
}

这是它正在处理

的结构（和Typedef）

typedef void(*BatchFunc)(const void*, void*);
struct JobBatch {
    volatile BatchFunc func;
    void* const share_data = operator new(SHARED_DATA_MAXSIZE);
    intptr_t const data_buffer = reinterpret_cast<intptr_t>(operator new (EXEC_DATA_BUFFER_SIZE));
    volatile size_t item_size;
    std::atomic<int> item; // Index into the data array
    volatile int n_items = 0;
    std::condition_variable complete; // slave -> master signal
    std::condition_variable ready;    // master -> slave signal
    std::mutex complete_m;
    std::mutex ready_m;
    bool store_values = false;
    volatile bool running = false; // there is work to do in the batch
    volatile bool done = false;    // there is no work left to do
    JobBatch();
} batch;

如何确保所有必要的读取并写入batch.func在线程之间正确同步？

以防万一重要：我正在使用Visual Studio并编译X64调试Windows可执行文件。英特尔i5，Windows 10，8GB RAM。

，所以我对C 内存模型进行了一些阅读，并设法使用atomic_thread_fence将解决方案删除。一切都可能超级破碎，因为我疯了，不应该在这里滚动自己的系统，但是，学习很有趣！

基本上，每当您写完想要其他线程的内容时，您都需要致电atomic_thread_fence(std::memory_order_release)

在接收线程上，您在阅读共享数据之前调用atomic_thread_fence(std::memory_order_acquire)。

在我的情况下，应立即进行释放，然后再等待条件变量，并在使用其他线程编写的数据之前立即进行获取。

这确保了一个线程上的写作。

我不是专家，所以这可能不是解决问题的正确方法，并且很可能会面对某些厄运。例如，我仍然有一个僵局/生病问题。

tl; dr：这不完全是缓存的东西：线程可能不会完全同步它们的数据