Caffe Cudnn 版本 4 和 5

caffe cudnn version 4 & 5

本文关键字：版本 Cudnn Caffe 更新时间：2023-10-16

我在caffe程序中使用cudnn加速。我在开始时使用cudnn 4，它工作得很好，但是当我将cudnn更新到5.0版本时，pow函数不起作用。调用函数位于batch_norm层中

caffe_gpu_powx(variance_.count(), variance_.gpu_data(), Dtype(0.5), variance_.mutable_gpu_data());

调用后的数据不改变。pow函数的定义如下，与caffe github group

中的定义相同。

template <typename Dtype>
__global__ void powx_kernel(const int n, const Dtype* a,
    const Dtype alpha, Dtype* y)
 {
     CUDA_KERNEL_LOOP(index, n)
     {           
         y[index] = pow(a[index], alpha);  
     }  
}
template <>
void caffe_gpu_powx<float>(const int N, const float* a,
    const float alpha, float* y) {
    // NOLINT_NEXT_LINE(whitespace/operators)
    powx_kernel<float><<<CAFFE_GET_BLOCKS(N), CAFFE_CUDA_NUM_THREADS>>>(
      N, a, alpha, y);
}

我犯了一个错误，我在TITAN X开始时将代码生成设置为"compute_52,sm_52"，但现在应该将较低的GPU设置为"compute_20,sm_20"。