模拟线程局部变量

Simulate thread local variables

本文关键字：局部变量线程模拟更新时间：2023-10-16

我想模拟非静态成员的线程局部变量，如下所示：

template< typename T, unsigned int tNumThread >
class ThreadLocal
{
private:
protected:
    T mData[tNumThread];
    unsigned int _getThreadIndex()
    {
        return ...; // i have a threadpool and each thread has an index from 0 to n
    }
public:
    ThreadLocal() {};
    ~ThreadLocal() {};
    T& operator ->()
    {
        return mData[_getThreadIndex()];
    }
    ...
};

但问题是线程数将在运行时确定，我必须从堆中分配mData。

我想知道有没有办法不使用堆分配并使用上面的常规数组？

每个线程都有自己的堆栈，请记住，当函数返回时，堆栈帧会被弹出（或者可以被认为是弹出）。

这就是为什么我们有"无栈python"，因为1个堆栈（python需要）和多个线程不能很好地发挥作用（参见全局解释器锁）

你可以把它放在main中，这将持续下去，但请记住C（++）想知道编译时的所有大小，所以如果线程计数发生变化（在编译时不固定），则无法知道这一点。

你真正想要的不是main的东西，但这不会是一个模板（在数字中），因为在编译时无法知道这个数字。

GCC 提供了一个堆栈分配函数（如 malloc），但我找不到它，最好避免使用它，因为这样优化实际上有效。

同样，不要低估 CPU 的预读能力和 GCC 优化，将阵列放在堆上也不错。

有趣的阅读与良好的图片，但不幸的是只与主题遥远相关：http://www.nongnu.org/avr-libc/user-manual/malloc.html

我建议：

std::unordered_map<std::thread::id, T, stackalloc> myTLS;

要么锁定所有访问权限，要么提前准备人口并在以后只读访问它。

您可以将其与堆栈分配器结合使用。

typedef short_alloc<pair<const thread::id, T>, maxthrds> stackalloc;

https://howardhinnant.github.io/stack_alloc.html

如果你想要另一个解决方案，在这里你可以做：

struct Padder
{
    T t;
    char space_[128 - sizeof(T)];  // if sizeof(T) >= 128 just don't include the padding.
};
std::array<Padder, maxthreads> myTLS;

实现 MESI 无瓶颈访问。
你必须关心使用此方法跟踪具有此数组中自己的索引的线程。