CPU 指令重新排序

CPU instruction reordering

本文关键字：排序新排序指令 CPU 更新时间：2023-10-16

我们的处理器可以对指令重新排序以获得一些性能优势，但这可能会导致一些奇怪的行为。我正在尝试在本文的基础上重现此问题之一。

这是我的代码：

int a,b;
int r1,r2;
mutex m1,m2;
void th1()
{
    for(;;)
    {
        m1.lock();
        a=1;
        asm volatile("" ::: "memory"); 
        r1=b;
        m1.unlock();
    }
}
void th2()
{
    for(;;)
    {
        m2.lock();
        b=1;
        asm volatile("" ::: "memory"); 
        r2=a;
        m2.unlock();
    }
}
int main()
{
    int cnt{0};
    thread thread1{th1};
    thread thread2{th2};
    thread1.detach();
    thread2.detach();
    for(int i=0;i<10000;i++)
    {
        m1.lock();
        m2.lock();
        a=b=0;
        m1.unlock();
        m2.unlock();
        if(r1==0&&r2==0)
        {
            ++cnt;
        }
    }
    cout<<cnt<<" CPU reorders happened!n";
}

我使用互斥锁来确保"主"线程在 th1 或 th2 执行其工作时不会修改 a 或 b，执行的输出一直在变化，它可能是 0，可能是 10000 或 0 到 10000 之间的随机数字。

这段代码有一些东西让我有点不舒服，我不确定它是否真的重现了 CPU 重新排序现象。

从代码看来，r1 和 r2 在"if"中可能为 0 的唯一方式是因为 th1 和 th2 将它们设置为"a"和"b"的值，在 th1 和 th2 的上下文中由于锁定机制不能为 0，这些变量为 0 的唯一方法是因为指令重新排序，这个对吗？

谢谢

您的程序与您从 preshing.com 引用的文章中的程序非常不同。 preshing.com 程序使用信号量，而您的程序使用互斥体。

互斥体比信号量更简单。它们只做一个保证 - 一次只有一个线程可以锁定互斥锁。也就是说，它们只能用于相互排斥。

preshing.com 程序使用其信号量做一些单独使用互斥锁无法做到的事情：它同步三个线程中的循环，以便它们都以锁步方式进行。 Thread1 和 Thread2 各自在循环的顶部等待，直到 main（）允许它们离开，然后 main 在其循环的底部等待，直到它们完成工作。然后他们又都转了一圈。

你不能用互斥锁做到这一点。在你的程序中，是什么阻止了 main 在其他两个线程中的任何一个运行之前绕其循环数千次？只有机会。也没有什么能阻止 Thread1 和/或 Thread2 在 main（）被阻塞时循环数千次，等待下一个时间片。

请记住，信号量是一个计数器。仔细查看 preshing.com 中的信号量如何由线程递增和递减，您将看到它如何保持线程同步。

我犯了使用互斥体而不是信号量的错误（感谢詹姆斯大），这是正常工作的代码：

#include <mutex>
#include <condition_variable>
using namespace std;
class semaphore{
private:
    mutex mtx;
    condition_variable cv;
    int cnt;
public:
    semaphore(int count = 0):cnt(count){}
    void notify()
    {
        unique_lock<mutex> lck(mtx);
        ++cnt;
        cv.notify_one();
    }
    void wait()
    {
        unique_lock<mutex> lck(mtx);
        while(cnt == 0){
            cv.wait(lck);
        }
        --cnt;
    }
};
int a,b;
int r1,r2;
semaphore s1,s2,s3;
void th1()
{
    for(;;)
    {
        s1.wait();
        a=1;
        asm volatile("" ::: "memory");
        r1=b;
        s3.notify();
    }
}
void th2()
{
    for(;;)
    {
        s2.wait();
        b=1;
        asm volatile("" ::: "memory");
        r2=a;
        s3.notify();
    }
}
int main()
{
    int cnt{0};
    thread thread1{th1};
    thread thread2{th2};
    thread1.detach();
    thread2.detach();
    for(int i=0;i<100000;i++)
    {
        a=b=0;
        s1.notify();
        s2.notify();
        s3.wait();
        s3.wait();
        if(r1==0&&r2==0)
        {
            ++cnt;
        }
    }
    cout<<cnt<<" CPU reorders happened!n";
}

重新排序似乎已正确复制。