cudaOccupancyMaxActiveBlocksPerMultiprocessor is undefined

本文关键字：undefined is cudaOccupancyMaxActiveBlocksPerMultiprocessor 更新时间：2023-10-16

我正在尝试学习cuda并以有效的方式使用它。我从 nvidia 的网站上找到了一个代码，它告诉我们可以了解我们应该用于设备最有效使用的块大小。代码如下：

#include <iostream>
// Device code
__global__ void MyKernel(int *d, int *a, int *b)
{
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    d[idx] = a[idx] * b[idx];
}
// Host code
int main()
{
    int numBlocks;        // Occupancy in terms of active blocks
    int blockSize = 32;
    // These variables are used to convert occupancy to warps
    int device;
    cudaDeviceProp prop;
    int activeWarps;
    int maxWarps;
    cudaGetDevice(&device);
    cudaGetDeviceProperties(&prop, device);
    cudaOccupancyMaxActiveBlocksPerMultiprocessor(
    &numBlocks,
    MyKernel,
    blockSize,
    0);
    activeWarps = numBlocks * blockSize / prop.warpSize;
    maxWarps = prop.maxThreadsPerMultiProcessor / prop.warpSize;
    std::cout << "Occupancy: " << (double)activeWarps / maxWarps * 100 << "%" << std::endl;
    return 0;
}

但是，当我编译它时，存在以下错误：

编译行：

nvcc ben_deneme2.cu -arch=sm_35 -rdc=true -lcublas -lcublas_device -lcudadevrt -o my

错误：

ben_deneme2.cu(25): error: identifier "cudaOccupancyMaxActiveBlocksPerMultiprocessor" is undefined
1 error detected in the compilation of "/tmp/tmpxft_0000623d_00000000-8_ben_deneme2.cpp1.ii".

我应该为此包含一个库，尽管我在互联网上找不到它的库名称吗？还是我做错了什么？提前致谢

cudaOccupancyMaxActiveBlocksPerMultiprocessor函数包含在 CUDA 6.5 中。如果您安装了以前版本的 CUDA，则无法访问该功能，例如，它不适用于 CUDA 5.5。

如果要使用该函数，则必须将 CUDA 版本至少更新到 6.5。

使用旧版本的人通常使用 Cuda 占用计算器。

用于选择良好块大小的一种常见启发式方法是以高占用率为目标，即每个多处理器的活动翘曲数与多处理器上可以同时处于活动状态的最大翘曲数之比。 -- CUDA 专业提示：占用 API 简化了启动配置