带中断的互斥安全(嵌入式固件）

Mutex Safety with Interrupts (Embedded Firmware)

本文关键字：嵌入式固件安全中断更新时间：2023-10-16

Edit @Mike指出我在下面代码中的try_lock函数是不安全的，并且访问器的创建也会产生竞争条件。 (来自每个人(的建议使我相信我走错了路。

原始问题

锁定嵌入式微控制器的要求与多线程不同

，以至于我无法将多线程示例转换为我的嵌入式应用程序。通常我没有任何类型的操作系统或线程，只有main和硬件定期调用的任何中断函数。

我需要从中断中填充缓冲区，但在main中处理它是很常见的。我在下面创建了IrqMutex类来尝试安全地实现这一点。每个尝试访问缓冲区的人都会通过IrqMutexAccessor分配一个唯一的id，然后他们每个人都可以try_lock()和unlock()。阻塞lock()函数的想法不适用于中断，因为除非您允许中断完成，否则无法执行其他代码，因此unlock()代码永远不会运行。但是，我偶尔会使用main()代码中的阻塞锁。

但是，我知道如果没有 C++11 内存屏障(在许多嵌入式平台上不可用(，双重检查锁就不起作用。老实说，尽管阅读了很多关于它的信息，但我真的不明白内存访问重新排序如何/为什么会导致问题。我认为使用易失性sig_atomic_t(可能与使用唯一 ID 结合使用(使其与双重检查锁不同。但我希望有人可以：确认以下代码是正确的，解释为什么它不安全，或者提供更好的方法来实现这一目标。

class IrqMutex {
friend class IrqMutexAccessor;
private:
    std::sig_atomic_t accessorIdEnum;
    volatile std::sig_atomic_t owner;
protected:
    std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }
    bool have_lock(std::sig_atomic_t accessorId) {
        return (owner == accessorId);
    }
    bool try_lock(std::sig_atomic_t accessorId) {
        // Only try to get a lock, while it isn't already owned.
        while (owner == SIG_ATOMIC_MIN) {
            // <-- If an interrupt occurs here, both attempts can get a lock at the same time.
            // Try to take ownership of this Mutex.
            owner = accessorId; // SET
            // Double check that we are the owner.
            if (owner == accessorId) return true;
            // Someone else must have taken ownership between CHECK and SET.
            // If they released it after CHECK, we'll loop back and try again.
            // Otherwise someone else has a lock and we have failed.
        }        
        // This shouldn't happen unless they called try_lock on something they already owned.
        if (owner == accessorId) return true;
        // If someone else owns it, we failed.
        return false;
    }
    bool unlock(std::sig_atomic_t accessorId) {
        // Double check that the owner called this function (not strictly required)
        if (owner == accessorId) {
            owner = SIG_ATOMIC_MIN;
            return true;
        }
        
        // We still return true if the mutex was unlocked anyway.
        return (owner == SIG_ATOMIC_MIN);
    }
public:
    IrqMutex(void) : accessorIdEnum(SIG_ATOMIC_MIN), owner(SIG_ATOMIC_MIN) {}
};
// This class is used to manage our unique accessorId.
class IrqMutexAccessor {
friend class IrqMutex;
private:
    IrqMutex& mutex;
    const std::sig_atomic_t accessorId;
public:
    IrqMutexAccessor(IrqMutex& m) : mutex(m), accessorId(m.nextAccessor()) {}
    bool have_lock(void) { return mutex.have_lock(accessorId); }
    bool try_lock(void) { return mutex.try_lock(accessorId); }
    bool unlock(void) { return mutex.unlock(accessorId); }
};

因为只有一个处理器，并且互斥锁没有线程，我认为这与正常目的略有不同。我反复遇到两个主要用例。

中断是一个生产者，它拥有空闲缓冲区的所有权，并为其加载数据包。中断/生产者可能会在多个中断调用中长时间保持其所有权锁。主要功能是使用者，并在准备好处理缓冲区时获得其所有权。争用条件很少发生，但如果中断/生产者以数据包完成并需要一个新的缓冲区，但它们都已满，它将尝试占用最旧的缓冲区(这是一个丢弃的数据包事件(。如果主/消费者在同一时间开始读取和处理最旧的缓冲区，它们将相互践踏。
中断只是某些东西(如计数器(的快速更改或增量。但是，如果我们想通过从 main(( 代码调用来重置计数器或跳转到某个新值，我们不想在计数器发生变化时尝试写入计数器。在这里 main 实际上做了一个阻塞循环来获得锁，但是我认为几乎不可能在这里实际等待两次以上的尝试。一旦它有了锁，任何对计数器中断的调用都将被跳过，但对于计数器之类的东西来说，这通常没什么大不了的。然后我更新计数器值并解锁它，以便它可以再次开始递增。

我

意识到这两个示例有点模糊，但是这些模式的某些版本出现在我从事的每个项目的许多外围设备中，我想要一段可重用的代码，可以安全地处理各种嵌入式平台。我包含了 C 标签，因为所有这些都可以直接转换为 C 代码，并且在一些嵌入式编译器上这就是所有可用的。所以我试图找到一种保证在 C 和 C++ 中都有效的通用方法。

struct ExampleCounter {
    volatile long long int value;
    IrqMutex mutex;
} exampleCounter;
struct ExampleBuffer {
    volatile char data[256];
    volatile size_t index;
    IrqMutex mutex; // One mutex per buffer.
} exampleBuffers[2];
const volatile char * const REGISTER;
// This accessor shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutex(exampleCounter.mutex);
void __irqQuickFunction(void) {
    // Obtain a lock, add the data then unlock all within one function call.
    if (myMutex.try_lock()) {
        exampleCounter.value++;
        myMutex.unlock();
    } else {
        // If we failed to obtain a lock, we skipped this update this one time.
    }
}
// These accessors shouldn't be created in an interrupt or a race condition can occur.
static IrqMutexAccessor myMutexes[2] = {
    IrqMutexAccessor(exampleBuffers[0].mutex),
    IrqMutexAccessor(exampleBuffers[1].mutex)
};
void __irqLongFunction(void) {
    static size_t bufferIndex = 0;
    // Check if we have a lock.
    if (!myMutex[bufferIndex].have_lock() and !myMutex[bufferIndex].try_lock()) {
        // If we can't get a lock try the other buffer
        bufferIndex = (bufferIndex + 1) % 2;
        // One buffer should always be available so the next line should always be successful.
        if (!myMutex[bufferIndex].try_lock()) return;
    }
    
    // ... at this point we know we have a lock ...
    // Get data from the hardware and modify the buffer here.
    const char c = *REGISTER;
    exampleBuffers[bufferIndex].data[exampleBuffers[bufferIndex].index++] = c;
    // We may keep the lock for multiple function calls until the end of packet.
    static const char END_PACKET_SIGNAL = '';    
    if (c == END_PACKET_SIGNAL) {
        // Unlock this buffer so it can be read from main.
        myMutex[bufferIndex].unlock();
        // Switch to the other buffer for next time.
        bufferIndex = (bufferIndex + 1) % 2;
    }
}
int main(void) {
    while (true) {
        // Mutex for counter
        static IrqMutexAccessor myCounterMutex(exampleCounter.mutex);
        // Change counter value
        if (EVERY_ONCE_IN_A_WHILE) {
            // Skip any updates that occur while we are updating the counter.
            while(!myCounterMutex.try_lock()) {
                // Wait for the interrupt to release its lock.
            }
            // Set the counter to a new value.
            exampleCounter.value = 500;
            // Updates will start again as soon as we unlock it.
            myCounterMutex.unlock();
        }
        // Mutexes for __irqLongFunction.
        static IrqMutexAccessor myBufferMutexes[2] = {
            IrqMutexAccessor(exampleBuffers[0].mutex),
            IrqMutexAccessor(exampleBuffers[1].mutex)
        };
        // Process buffers from __irqLongFunction.
        for (size_t i = 0; i < 2; i++)  {
            // Obtain a lock so we can read the data.
            if (!myBufferMutexes[i].try_lock()) continue;
                // Check that the buffer isn't empty.
                if (exampleBuffers[i].index == 0) {
                    myBufferMutexes[i].unlock(); // Don't forget to unlock.
                    continue;
                }
                // ... read and do something with the data here ...
                exampleBuffer.index = 0;
                myBufferMutexes[i].unlock();
            }
        }
    }
}

另请注意，我在中断例程读取或写入的任何变量上都使用了volatile(除非该变量仅从中断访问，如 __irqLongFunction 中的static bufferIndex值(。我已经读到互斥锁消除了多线程代码中对volatile的一些需求，但我认为这不适用于这里。 我是否使用了适量的volatile？ 我把它用在：ExampleBuffer[].data[256]、ExampleBuffer[].index和ExampleCounter.value。

对于冗长的回答，我深表歉意，但也许它适合一个长的问题。

为了回答你的第一个问题，我想说你实现IrqMutex是不安全的。让我试着解释一下我在哪里看到问题。

功能`nextAccessor`

std::sig_atomic_t nextAccessor(void) { return ++accessorIdEnum; }

此函数具有争用条件，因为增量运算符不是原子的，尽管它位于标记为 volatile 的原子值上。它涉及 3 个操作：读取 accessorIdEnum 的当前值、递增它和写回结果。如果同时创建两个IrqMutexAccessor，则它们可能都获得相同的 ID。

功能`try_lock`

try_lock函数还具有争用条件。一个线程(例如main(可以进入while循环，然后在获得所有权之前，另一个线程(例如中断(也可以进入while循环并获取锁的所有权(返回true(。然后第一个线程可以继续，移动到owner = accessorId，因此"也"拿锁。因此，两个线程(或您的main线程和一个中断(可以同时try_lock无主互斥锁，并且都返回true。

禁用 RAII 的中断

我们可以通过使用 RAII 进行中断禁用来实现一定程度的简单性和封装，例如以下类：

class InterruptLock {
public:
    InterruptLock() { 
        prevInterruptState = currentInterruptState();
        disableInterrupts();
    }
    ~InterruptLock() { 
        restoreInterrupts(prevInterruptState);
    }
private:
    int prevInterruptState; // Whatever type this should be for the platform
    InterruptLock(const InterruptLock&); // Not copy-constructable
};

我建议禁用中断以获得互斥实现本身所需的原子性。例如：

bool try_lock(std::sig_atomic_t accessorId) {
    InterruptLock lock;
    if (owner == SIG_ATOMIC_MIN) {
        owner = accessorId;
        return true;
    }
    return false;
}
bool unlock(std::sig_atomic_t accessorId) {
    InterruptLock lock;
    if (owner == accessorId) {
        owner = SIG_ATOMIC_MIN;
        return true;
    }
    return false;
}

根据您的平台，这可能看起来不同，但您明白了。

正如您所说，这提供了一个平台来抽象出一般代码中的禁用和启用中断，并将其封装到这个类中。

互斥和中断

说了我将如何考虑实现互斥类，我实际上不会为您的用例使用互斥类。正如您所指出的，互斥锁不能很好地处理中断，因为中断不能在尝试获取互斥锁时"阻塞"。出于这个原因，对于直接与中断交换数据的代码，我强烈建议考虑直接禁用中断(在主"线程"接触数据时很短的时间内(。

因此，您的计数器可能只是如下所示：

volatile long long int exampleCounter;
void __irqQuickFunction(void) {
    exampleCounter++;
}
...
// Change counter value
if (EVERY_ONCE_IN_A_WHILE) {
    InterruptLock lock;
    exampleCounter = 500;
}

在我看来，这更容易阅读，更容易推理，并且在发生争用时不会"滑倒"(即错过计时器节拍(。

关于缓冲区用例，我强烈建议不要在多个中断周期内保持锁定。锁/互斥锁应该保持"接触"一段内存所需的最轻微的时刻 - 刚好足够长的时间读取或写入它。进去，出去。

因此，缓冲示例如下所示：

struct ExampleBuffer {
    char data[256];
} exampleBuffers[2];
ExampleBuffer* volatile bufferAwaitingConsumption = nullptr;
ExampleBuffer* volatile freeBuffer = &exampleBuffers[1];
const volatile char * const REGISTER;
void __irqLongFunction(void) {
    static const char END_PACKET_SIGNAL = '';    
    static size_t index = 0;
    static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
    // Get data from the hardware and modify the buffer here.
    const char c = *REGISTER;
    receiveBuffer->data[index++] = c;
    // End of packet?
    if (c == END_PACKET_SIGNAL) {
        // Make the packet available to the consumer
        bufferAwaitingConsumption = receiveBuffer;
        // Move on to the next buffer
        receiveBuffer = freeBuffer;
        freeBuffer = nullptr;
        index = 0;
    }
}

int main(void) {
    while (true) {
        // Fetch packet from shared variable
        ExampleBuffer* packet;
        {
            InterruptLock lock;
            packet = bufferAwaitingConsumption;
            bufferAwaitingConsumption = nullptr;
        }
        if (packet) {
            // ... read and do something with the data here ...
            // Once we're done with the buffer, we need to release it back to the producer
            {
                InterruptLock lock;
                freeBuffer = packet;
            }
        }
    }
}

这段代码可以说更容易推理，因为在中断和主循环之间只有两个内存位置共享：一个用于将数据包从中断传递到主循环，另一个用于将空缓冲区传递回中断。我们也只触及"锁定"下的那些变量，并且只在"移动"值所需的最短时间内。(为简单起见，当主循环需要太长时间来释放缓冲区时，我跳过了缓冲区溢出逻辑(。

确实，在这种情况下甚至

可能不需要锁，因为我们只是读取和写入简单的值，但是禁用中断的成本并不高，否则出错的风险在我看来是不值得的。

编辑

正如评论中指出的，上述解决方案仅用于解决多线程问题，并省略了溢出检查。以下是更完整的解决方案，在溢出条件下应该具有鲁棒性：

const size_t BUFFER_COUNT = 2; 
struct ExampleBuffer {
    char data[256];
    ExampleBuffer* next;
} exampleBuffers[BUFFER_COUNT];
volatile size_t overflowCount = 0;
class BufferList {
public:
    BufferList() : first(nullptr), last(nullptr) { }
    // Atomic enqueue
    void enqueue(ExampleBuffer* buffer) {
        InterruptLock lock;
        if (last)
            last->next = buffer;
        else {
            first = buffer;
            last = buffer;
        }
    }
    // Atomic dequeue (or returns null)
    ExampleBuffer* dequeueOrNull() {
        InterruptLock lock;
        ExampleBuffer* result = first;
        if (first) {
            first = first->next;
            if (!first)
                last = nullptr;
        }
        return result;
    }
private:
    ExampleBuffer* first;
    ExampleBuffer* last;
} freeBuffers, buffersAwaitingConsumption;
const volatile char * const REGISTER;
void __irqLongFunction(void) {
    static const char END_PACKET_SIGNAL = '';    
    static size_t index = 0;
    static ExampleBuffer* receiveBuffer = &exampleBuffers[0];
    // Recovery from overflow?
    if (!receiveBuffer) {
        // Try get another free buffer
        receiveBuffer = freeBuffers.dequeueOrNull();
        // Still no buffer?
        if (!receiveBuffer) {
            overflowCount++;
            return; 
        }
    }
    // Get data from the hardware and modify the buffer here.
    const char c = *REGISTER;
    if (index < sizeof(receiveBuffer->data))
        receiveBuffer->data[index++] = c;
    // End of packet, or out of space?
    if (c == END_PACKET_SIGNAL) {
        // Make the packet available to the consumer
        buffersAwaitingConsumption.enqueue(receiveBuffer);
        // Move on to the next free buffer
        receiveBuffer = freeBuffers.dequeueOrNull();
        index = 0;
    }
}
size_t getAndResetOverflowCount() {
    InterruptLock lock;
    size_t result = overflowCount;
    overflowCount = 0;
    return result;
}

int main(void) {
    // All buffers are free at the start
    for (int i = 0; i < BUFFER_COUNT; i++)
        freeBuffers.enqueue(&exampleBuffers[i]);
    while (true) {
        // Fetch packet from shared variable
        ExampleBuffer* packet = dequeueOrNull();
        if (packet) {
            // ... read and do something with the data here ...
            // Once we're done with the buffer, we need to release it back to the producer
            freeBuffers.enqueue(packet);
        }
        size_t overflowBytes = getAndResetOverflowCount();
        if (overflowBytes) {
            // ...
        }
    }
}

主要更改：

如果中断的可用缓冲区用完，它将恢复
如果中断在没有接收缓冲区的情况下接收数据，它将通过getAndResetOverflowCount将其传达给主线程
如果不断出现缓冲区溢出，只需增加缓冲区计数即可
我已经将多线程访问封装到一个实现为链表 (BufferList ( 的队列类中，该类支持原子取消排队和排队。前面的示例也使用了队列，但长度为 0-1(项目已排队或不排队(，因此队列的实现只是一个变量。在可用缓冲区不足的情况下，接收队列可能有 2 个项目，因此我将其升级到适当的队列，而不是添加更多共享变量。

如果中断是生产者，主线代码是使用者，那么在消费操作期间禁用中断肯定很简单吗？

这就是我在嵌入式微控制器时代的做法。