并行程序从openMP到openCL的转换

Converting parallel program from openMP to openCL

本文关键字：openCL 转换 openMP 程序并行更新时间：2023-10-16

我只是想知道如何将下面的openMP程序转换为openCL程序。

使用openMP实现的算法的并行部分如下所示:

#pragma omp parallel
  {
    int thread_id = omp_get_thread_num();
    //double mt_probThreshold = mt_nProbThreshold_;
    double mt_probThreshold = nProbThreshold;
    int mt_nMaxCandidate = mt_nMaxCandidate_;
    double mt_nMinProb = mt_nMinProb_;
    int has_next = 1;
    std::list<ScrBox3d> mt_detected;
    ScrBox3d  sample;
    while(has_next) {
#pragma omp critical
    {  // '{' is very important and define the block of code that needs lock.
      // Don't remove this pair of '{' and '}'.
      if(piter_ == box_.end()) {
        has_next = 0;
      } else{
        sample = *piter_;
        ++piter_;
      }
    }  // '}' is very important and define the block of code that needs lock.
    if(has_next){
      this->SetSample(&sample, thread_id);
      //UpdateSample(sample, thread_id); // May be necesssary for more sophisticated features
      sample._prob = (float)this->Prob( true, thread_id, mt_probThreshold);
      //sample._prob = (float)_clf->LogLikelihood( thread_id);
      InsertCandidate( mt_detected, sample, mt_probThreshold, mt_nMaxCandidate, mt_nMinProb );
    }
  }
#pragma omp critical
  {  // '{' is very important and define the block of code that needs lock.
    // Don't remove this pair of '{' and '}'.
    if(mt_detected_.size()==0) {
      mt_detected_    = mt_detected;
      //mt_nProbThreshold_  = mt_probThreshold;
      nProbThreshold = mt_probThreshold;
    } else {
      for(std::list<ScrBox3d>::iterator it = mt_detected.begin(); 
          it!=mt_detected.end(); ++it)
        InsertCandidate( mt_detected_, *it, /*mt_nProbThreshold_*/nProbThreshold, 
        mt_nMaxCandidate_, mt_nMinProb_ );
      }
    }  // '}' is very important and define the block of code that needs lock.
  }//parallel section end

我的问题是:这部分可以用openCL实现吗?我遵循了一系列的openCL教程，我理解了工作的方式，我在。cu文件中编写代码，(我以前安装过CUDA工具包)但在这种情况下情况更复杂，因为使用了很多头文件，模板类和面向对象编程。

我如何将这部分在openMP中实现到openCL?我应该创建一个新的.cu文件吗?

任何建议都有帮助。提前谢谢。

<标题>编辑:

使用VS profiler，我注意到大部分的执行时间花在InsertCandidate()函数上，我正在考虑写一个内核来在GPU上执行这个函数。该函数最昂贵的操作是for指令。但是可以看到，每个for循环包含3个if指令，这可能导致发散，导致串行化，即使在GPU上执行。

for( iter = detected.begin(); iter != detected.end(); iter++ )
    {
        if( nCandidate == nMaxCandidate-1 )
            nProbThreshold = iter->_prob;
        if( box._prob >= iter->_prob )
            break;
        if( nCandidate >= nMaxCandidate && box._prob <= nMinProb )
            break;
        nCandidate ++;
    }

作为结论，这个程序可以转换成openCL吗?

可以将示例代码转换为opencl，但是我发现了这样做的几个问题。

在执行过程中向进程添加工作是opencl中一个相当新的特性。您必须要么使用opencl 2.0，要么事先知道将添加多少工作，并预先分配内存来存储新的数据结构。对InsertCandidate的调用可能是"不能"转换为opencl的部分。

如果函数足够大，您可以将调用移植到this->Prob(…)。您需要能够通过将参数存储在合适的数据结构中来缓存大量调用。我说的"一堆"是指至少数百个，但理想情况下是数千个或更多。同样，只有当this->Prob()对于所有调用都是常量，并且足够复杂，值得往返于opencl设备并返回时，才值得这样做。