在写依赖性之后,读取了什么

What is this Read after Write dependency?

本文关键字:读取 什么 之后 依赖性      更新时间:2023-10-16

我有这个循环此函数:

Mat HessianDetector::hessianResponse(const Mat &inputImage, float norm)
{
   //...
   const float *in = inputImage.ptr<float>(1);
   Mat outputImage(rows, cols, CV_32FC1);
   float      *out = outputImage.ptr<float>(1) + 1;
   //...
   for (int r = 1; r < rows - 1; ++r)
   {
      float v11, v12, v21, v22, v31, v32;      
      v11 = in[-stride]; v12 = in[1 - stride];
      v21 = in[      0]; v22 = in[1         ];
      v31 = in[+stride]; v32 = in[1 + stride];
      in += 2;
      for (int c = 1; c < cols - 1; ++c, in++, out++)
      {
         /* fetch remaining values (last column) */
         const float v13 = in[-stride];
         const float v23 = *in;
         const float v33 = in[+stride];
         // compute 3x3 Hessian values from symmetric differences.
         float Lxx = (v21 - 2*v22 + v23);
         float Lyy = (v12 - 2*v22 + v32);
         float Lxy = (v13 - v11 + v31 - v33)/4.0f;
         /* normalize and write out */
         *out = (Lxx * Lyy - Lxy * Lxy)*norm2;
         /* move window */
         v11=v12; v12=v13;
         v21=v22; v22=v23;
         v31=v32; v32=v33;
         /* move input/output pointers */
      }
      out += 2;
   }
   return outputImage;
}

称为:

#pragma omp for collapse(2) schedule(dynamic)
for(int i=0; i<levels; i++)
    for (int j = 1; j <= scaleCycles; j++)
    {
        int scaleCyclesLevel = scaleCycles * i;
        float curSigma = par.sigmas[j];
        hessResps[j+scaleCyclesLevel] = hessianResponse(blurs[j+scaleCyclesLevel], curSigma*curSigma);
    }

特别是,英特尔顾问说,内部循环很耗时,应进行矢量化:

for (int c = 1; c < cols - 1; ++c, in++, out++)

但是,它还说在这两行的写作依赖性后有读物:

阅读:

float Lyy = (v12 - 2*v22 + v32);

写:

hessResps[j+scaleCyclesLevel] = hessianResponse(blurs[j+scaleCyclesLevel], curSigma*curSigma);

,但我真的不明白为什么会发生这种情况(即使我知道原始依赖的含义(。

这是优化报告:

   LOOP BEGIN at /home/luca/Dropbox/HKUST/CloudCache/cloudcache/CloudCache/Descriptors/hesaff/pyramid.cpp(92,7)
      remark #17104: loop was not parallelized: existence of parallel dependence
      remark #17106: parallel dependence: assumed ANTI dependence between *(in+cols*4) (95:28) and *out (105:11)
      remark #17106: parallel dependence: assumed FLOW dependence between *out (105:11) and *(in+cols*4) (95:28)
      remark #15344: loop was not vectorized: vector dependence prevents vectorization
      remark #15346: vector dependence: assumed ANTI dependence between *(in+cols*4) (95:28) and *out (105:11)
      remark #15346: vector dependence: assumed FLOW dependence between *out (105:11) and *(in+cols*4) (95:28)
   LOOP END

第95行是:

     const float v13 = in[-stride];

第105行是:

     *out = (Lxx * Lyy - Lxy * Lxy)*norm2;

优化报告告诉您的是,您在循环的一个迭代中具有一些值,这些值取决于上一个迭代的值。特别是,"移动窗口"块复制了当地人之间的值,以便下一个迭代中的v11v12等值取决于此迭代中v12v23等的值。这样可以防止编译器对循环进行矢量化。

该解决方案是初始化c循环体内的所有9个v变量。

我不知道解决此问题是否会清除原始的原始问题。

另一个调整是将scaleCyclesLevel移出j循环(因此它是i循环(,因为其值不取决于j

我不知道inputImageoutputImage是如何传递到该功能的。如果您不以restricted的形式传递它们,则编译器不知道数据是否重叠,因此写入*out是不安全的,因为它可以覆盖下一个迭代的*in

请查看如何告诉您的编译器,图像数据不会重叠。对于gcc,它是限制

相关文章: