将内部循环与OpenMP并行

Parallelizing an inner loop with OpenMP

本文关键字：OpenMP 并行循环内部更新时间：2023-10-16

假设我们有两个嵌套循环。内部循环应该是并行的，但外部循环需要按顺序执行。然后下面的代码做我们想要的：

for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work
}
}

现在假设每个线程都必须获得一些线程本地对象来执行内部循环中的工作，并且获得这些线程本地对象的成本很高。因此，我们不想做以下事情：

for (int i = 0; i < N; ++i) {
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) {
ThreadLocalObject &obj = GetTLO(omp_get_thread_num()); // Costly!
// Do some work with the help of obj
}
}

我该如何解决这个问题？

每个线程只应请求其本地对象一次。
内部循环应该在所有线程之间并行化。
外部循环的迭代应该一个接一个地执行。

我的想法如下，但它真的想要我想要吗？

#pragma omp parallel
{
ThreadLocalObject &obj = GetTLS(omp_get_thread_num());
for (int i = 0; i < N; ++i) {
#pragma omp for schedule(static)
for (int j = first(i); j < last(i); ++j) {
// Do some work with the help of obj
}
}
}

当您可以简单地使用对象池时，我真的不明白为什么threadprivate的复杂性是必要的。基本想法应该沿着以下几条线：

#pragma omp parallel
{      
// Will hold an handle to the object pool
auto pool = shared_ptr<ObjectPool>(nullptr); 
#pragma omp single copyprivate(pool)
{
// A single thread creates a pool of num_threads objects
// Copyprivate broadcasts the handle
pool = create_object_pool(omp_get_num_threads());
}
for (int i = 0; i < N; ++i) 
{
#pragma omp parallel for schedule(static)
for (int j = first(i); j < last(i); ++j) 
{
// The object is not re-created, just a reference to it
// is returned from the pool
auto & r = pool.get( omp_get_thread_num() );
// Do work with r
}
}
}