如何并行化此结构

How to parallelize this structure?

本文关键字：结构并行化更新时间：2023-10-16

我正试图在C++中使用OpenMP并行化以下结构：

x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;
for (i=0;i<N;++i) {
   for (j=0;j<M;++j) {
      x2 = j - a2;
      y2 = i - b2;
      func(x1,y1,x2,y2); // the function changes x1,y1 values
      x2 = x1;
      y2 = y1;
      func2(x1,y1,x2,y2); // the function changes x1,y1 values
      x1 += a1;
      y1 += b1;
      k1 = func3(x1,y1);
      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];
      k2 += 2;
   }
}

你能帮我吗？。我真的很感激你能提供的任何帮助。

编辑：

我尝试过的最后一个解决方案是：

x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;
#pragma omp parallel for ordered schedule(dynamic,1) collapse(2)
for (i=0;i<N;++i) {
   for (j=0;j<M;++j) {
      x2 = j - a2;
      y2 = i - b2;
      func(x1,y1,x2,y2); // the function changes x1,y1 values
      x2 = x1;
      y2 = y1;
      func2(x1,y1,x2,y2); // the function changes x1,y1 values
#pragma omp critical
{
      x1 += a1;
      y1 += b1;
}
      k1 = func3(x1,y1);
#pragma omp ordered
{
      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];
}
#pragma omp atomic
      k2 += 2;
   }
}

导致分段故障。

我阅读了你与@Guiroux的聊天，并确定你的循环是独立的。当前从一次迭代到另一次迭代的唯一东西是k2，它可以直接从i和j计算出来（k2 = 2*j+2*M*i）。

因此，您的并行代码可以简单地为

int k1, k2;
double x1,x2,y1,y2;
double a1,a2,b1,b2; //initialized to some values
double vec1[2*M*N],vec2[2*M*N]; // vec1 is read-only past this point
/* 
   NOTE: private(var) means that it may no longer have same value it had previously.
   In your case, these were all set to 0 before. After reading chat, it seems that
   they were never actually used as input anyway. As long as they are written to 
   before being read from (i.e. x1,y1 in func), no seg fault will occur.
*/
#pragma omp parallel for private(x2,y2,x1,y1,k1,k2)
for (int i=0; i<N; ++i) {      // I defined i,j here
   for (int j=0; j<M; ++j) {   // If you define them outside parallel for, 
                               // they must also be made private
      // variables set below are all private, so no threads will overwrite 
      // work that other threads have done
      x2 = j - a2;
      y2 = i - b2;
      func(x1,y1,x2,y2); // the function sets x1,y1 values
      x2 = x1;
      y2 = y1;
      func2(x1,y1,x2,y2); // the function sets x1,y1 values
      x1 += a1;
      y1 += b1;
      k1 = func3(x1,y1);
      // k2 is never the same for 2 different values of i, 
      // so different threads will never clobber each other here:
      k2 = 2*j+2*M*i;
      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];
   }
}

您不必同时折叠两个循环。只要N > nCoresOnYourComputer，就可以看到工作将如何在所有处理器之间平均分配。

希望这一切都有意义。试着在我的代码中弄清楚为什么我必须定义某些变量private（而不是默认的shared），以及为什么k2必须像我那样重新定义。

留给读者的练习：除此之外，如何设置k2（也可以安全地避免竞争条件），但主要保留您最初使用的逻辑（即每次迭代k2 += 2）？