从静态初始化代码启动CUDA内核时遇到问题

Trouble launching CUDA kernels from static initialization code

本文关键字:内核 遇到 问题 CUDA 启动 静态 初始化 代码      更新时间:2023-10-16

我有一个在其构造函数中调用内核的类,如下所示:

"ScalarField.h"

#include <iostream>
    void ERROR_CHECK(cudaError_t err,const char * msg) {
        if(err!=cudaSuccess) {
            std::cout << msg << " : " << cudaGetErrorString(err) << std::endl;
            std::exit(-1);
        }
    }
    class ScalarField {
    public:
        float* array;
        int dimension;
        ScalarField(int dim): dimension(dim) {
            std::cout << "Scalar Field" << std::endl;
            ERROR_CHECK(cudaMalloc(&array, dim*sizeof(float)),"cudaMalloc");
        }
    };
"classA.h"

#include "ScalarField.h"

static __global__ void KernelSetScalarField(ScalarField v) {
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    if (index < v.dimension) v.array[index] = 0.0f;
}
class A {
public:
    ScalarField v;
    A(): v(ScalarField(3)) {
        std::cout << "Class A" << std::endl;
        KernelSetScalarField<<<1, 32>>>(v);
        ERROR_CHECK(cudaGetLastError(),"Kernel");
    }
};
"main.cu"

#include "classA.h"
A a_object;
int main() {
    std::cout << "Main" << std::endl;
    return 0;
}

如果我在main (A a_object;)上实例化这个类,我不会得到任何错误。然而,如果我在main之外实例化它,就在定义它(class A {...} a_object;)之后,当内核启动时,我得到一个"无效的设备函数"错误。为什么会这样呢?

编辑

更新代码,提供更完整的示例。

编辑2

按照Raxvan评论中的建议,我想说我在ScalarField构造函数中使用的dimensions变量也在main之外定义(在另一个类中),但在其他一切之前。这就是解释吗?调试器显示了dimensions的正确值。

短版:

class A在main之外实例化时问题的根本原因是,在调用class A的构造函数之前,需要用内核初始化CUDA运行时库的特定钩子例程没有运行。这是因为在c++执行模型中不能保证静态对象的实例化和初始化顺序。你的全局作用域类在做CUDA设置的全局作用域对象被初始化之前被实例化。在调用内核代码之前,内核代码从未被加载到上下文中,并且导致运行时错误。

据我所知,这是CUDA运行时API的真正限制,而不是在用户代码中容易修复的东西。在您的简单示例中,您可以将内核调用替换为对cudaMemset或一个非基于符号的运行时API memset函数的调用,并且它将工作。这个问题完全局限于通过运行时API在运行时加载的用户内核或设备符号。出于这个原因,空的默认构造函数也可以解决您的问题。从设计的角度来看,我对任何在构造函数中调用内核的模式都持怀疑态度。添加一个不依赖于默认构造函数或析构函数的GPU设置/拆除类的特定方法将是一个更干净,更不容易出错的设计。

详细

:

有一个内部生成的例程(__cudaRegisterFatBinary),必须运行它来加载和注册内核,纹理和静态定义的设备符号包含在任何运行时API程序的fatbin有效载荷中,CUDA驱动程序API在内核可以被无错误地调用之前。这是运行时API的"惰性"上下文初始化特性的一部分。您可以自己确认如下:

这是您发布的修改后的示例的gdb跟踪。注意,我在__cudaRegisterFatBinary中插入了一个断点,在调用静态A构造函数和内核启动失败之前,没有到达这个断点:

talonmies@box:~$ gdb a.out 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/talonmies/a.out...done.
(gdb) break '__cudaRegisterFatBinary' 
Breakpoint 1 at 0x403180
(gdb) run
Starting program: /home/talonmies/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Scalar Field
[New Thread 0x7ffff5a63700 (LWP 10774)]
Class A
Kernel : invalid device function 
[Thread 0x7ffff5a63700 (LWP 10774) exited]
[Inferior 1 (process 10771) exited with code 0377]
下面是相同的过程,这次A实例化在main内部(保证在执行惰性设置的对象初始化之后发生):
talonmies@box:~$ cat main.cu
#include "classA.h"

int main() {
    A a_object;
    std::cout << "Main" << std::endl;
    return 0;
}
talonmies@box:~$ nvcc --keep -arch=sm_30 -g main.cu
talonmies@box:~$ gdb a.out 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/talonmies/a.out...done.
(gdb) break '__cudaRegisterFatBinary' 
Breakpoint 1 at 0x403180
(gdb) run
Starting program: /home/talonmies/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x0000000000403180 in __cudaRegisterFatBinary ()
(gdb) cont
Continuing.
Scalar Field
[New Thread 0x7ffff5a63700 (LWP 11084)]
Class A
Main
[Thread 0x7ffff5a63700 (LWP 11084) exited]
[Inferior 1 (process 11081) exited normally]

如果这对您来说真的是一个严重的问题,我建议您联系NVIDIA开发人员支持并提出错误报告。