尝试实现无锁队列时堆栈溢出

Stack overflow when trying to implement lock-free queue

本文关键字：堆栈栈溢出队列实现更新时间：2023-10-16

我实现了一个基于Maged M.Michael和Michael L.Scott作品《简单、快速和实用的非阻塞和阻塞》中指定的算法的无锁队列并发队列算法(对于算法，请跳到第4页(

我在shared_ptr上使用了原子操作，如std::atomic_load_explicit等。

当只在一个线程中使用队列时，一切都很好，但当从不同线程使用它时，我会得到堆栈溢出异常。

不幸的是，我找不到问题的根源。似乎当一个shared_ptr超出范围时，它会减少下一个ConcurrentQueueNode上的引用数量，并导致无限递归，但我不明白为什么。。

代码：

队列节点：

template<class T>
struct ConcurrentQueueNode {
    T m_Data;
    std::shared_ptr<ConcurrentQueueNode> m_Next;
    template<class ... Args>
    ConcurrentQueueNode(Args&& ... args) :
        m_Data(std::forward<Args>(args)...) {}
    std::shared_ptr<ConcurrentQueueNode>& getNext() {
        return m_Next;
    }
    T getValue() {
        return std::move(m_Data);
    }
};

并发队列(注意：不适合胆小的人(：

template<class T>
class ConcurrentQueue {
    std::shared_ptr<ConcurrentQueueNode<T>> m_Head, m_Tail;
public:
ConcurrentQueue(){
    m_Head = m_Tail = std::make_shared<ConcurrentQueueNode<T>>();
}
template<class ... Args>
void push(Args&& ... args) {
    auto node = std::make_shared<ConcurrentQueueNode<T>>(std::forward<Args>(args)...);
    std::shared_ptr<ConcurrentQueueNode<T>> tail;
    for (;;) {
        tail = std::atomic_load_explicit(&m_Tail, std::memory_order_acquire);
        std::shared_ptr<ConcurrentQueueNode<T>> next = 
            std::atomic_load_explicit(&tail->getNext(),std::memory_order_acquire);
        if (tail == std::atomic_load_explicit(&m_Tail, std::memory_order_acquire)) {
            if (next.get() == nullptr) {
                auto currentNext = std::atomic_load_explicit(&m_Tail, std::memory_order_acquire)->getNext();
                auto res = std::atomic_compare_exchange_weak(&tail->getNext(), &next, node);
                if (res) {
                    break;
                }
            }
            else {
                std::atomic_compare_exchange_weak(&m_Tail, &tail, next);
            }
        }
    }
    std::atomic_compare_exchange_strong(&m_Tail, &tail, node);
}
bool tryPop(T& dest) {
    std::shared_ptr<ConcurrentQueueNode<T>> head;
    for (;;) {
        head = std::atomic_load_explicit(&m_Head, std::memory_order_acquire);
        auto tail = std::atomic_load_explicit(&m_Tail,std::memory_order_acquire);
        auto next = std::atomic_load_explicit(&head->getNext(), std::memory_order_acquire);
        if (head == std::atomic_load_explicit(&m_Head, std::memory_order_acquire)) {
            if (head.get() == tail.get()) {
                if (next.get() == nullptr) {
                    return false;
                }
                std::atomic_compare_exchange_weak(&m_Tail, &tail, next);
            }
            else {
                dest = next->getValue();
                auto res = std::atomic_compare_exchange_weak(&m_Head, &head, next);
                if (res) {
                    break;
                }
            }
        }
    }
    return true;
}
};

再现问题的示例用法：

int main(){
    ConcurrentQueue<int> queue;
    std::thread threads[4];
for (auto& thread : threads) {
    thread = std::thread([&queue] {
        for (auto i = 0; i < 100'000; i++) {
            queue.push(i);
            int y;
            queue.tryPop(y);
        }
    });
}
for (auto& thread : threads) {
    thread.join();
}
return 0;
}

问题在于竞争条件，它可能导致队列中的每个节点都在等待一次释放，这是递归的，会破坏堆栈。

如果您将测试更改为只使用一个线程但不弹出，则每次都会出现相同的堆栈溢出错误。

for (auto i = 1; i < 100000; i++) {
  queue.push(i);
  //int y;
  //queue.tryPop(y);
}

你需要取消草书删除节点链：

__forceinline ~ConcurrentQueueNode() {
    if (!m_Next || m_Next.use_count() > 1)
        return;
    KillChainOfDeath();
}
void KillChainOfDeath() {
    auto pThis = this;
    std::shared_ptr<ConcurrentQueueNode> Next, Prev;
    while (1) {
        if (pThis->m_Next.use_count() > 1)
          break;
        Next.swap(pThis->m_Next); // unwire node
        Prev = NULL; // free previous node that we unwired in previous loop
        if (!(pThis = Next.get())) // move to next node
            break;
        Prev.swap(Next); // else Next.swap will free before unwire.
    }
}

我以前从未使用过shared_ptr，所以我不知道是否有更快的方法。此外，由于我以前从未使用过shared_ptr，我不知道你的算法是否会遇到ABA问题。除非shared_ptr实现中有特殊的东西来阻止ABA，否则我担心以前释放的节点可能会被重用，从而欺骗CAS。不过，当我运行您的代码时，我似乎从未遇到过这个问题。