Cuda blockDim.y always ==1

Cuda blockDim.y always ==1

本文关键字:always blockDim Cuda      更新时间:2023-10-16

我总是得到blockdim.y==1。无论我在numBlocks中设置了什么值,我总是得到相同的值。

__global__ void CalcVideo(unsigned char *original, unsigned char *candidate, int *answer)
{
    printf("block id.x = %d blockid.y=%d blockdim.x = %d blockdim.y = %d Thread id= %d n", 
        blockIdx.x, blockIdx.y, blockDim.x, blockDim.y, threadIdx.x );
}
int ORIGINAL_FRAMES = 3;
int CANDIDATE_FRAMES = 2;
int FRAME_LENGHT = 3;
dim3 numBlocks(ORIGINAL_FRAMES, CANDIDATE_FRAMES);
    dim3 threadsPerBlock(3);  // 64 threads
CalcVideo << <numBlocks, threadsPerBlock >> >(original_device, candidate_device, answer_device);

y.blokcs的数量执行正确,但为什么程序给了我错误的blockdim.y大小?

block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 1 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 1 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 0 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 0 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 2 blockid.y=1 blockdim.x = 3 blockdim.y = 1 Thread id= 2
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 0
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 1
block id.x = 2 blockid.y=0 blockdim.x = 3 blockdim.y = 1 Thread id= 2

blockDim存储一个块的尺寸。在您的情况下,您将传递threadsPerBlock作为块维度,这将使其成为3 x 1 x 1。内核调用的第一个参数numBlocks控制块的网格的维度—您可以在内核中以gridDim的形式访问它。


附带说明:我认为问题中极低数量和大小的块仅用于测试目的,因为它们会使任何GPU在实践中都得不到充分利用。