推力变换引发错误："bulk_kernel_by_value: an illegal memory access was encountered"

Thrust transform throws error: "bulk_kernel_by_value: an illegal memory access was encountered"

本文关键字：illegal an value memory encountered was access by 变换错误 bulk 更新时间：2023-10-16

我对 CUDA/Thrust 相当陌生，并且代码片段有问题。为了方便起见，我将其修剪到最低限度。代码如下：

struct functor{
functor(float (*g)(const float&)) : _g{g} {}
__host__ __device__ float operator()(const float& x) const { 
        return _g(x);
    }
private:
    float (*_g)(const float&);
};
__host__ __device__ float g(const float& x){return 3*x;}
int main(void){
thrust::device_vector<float> X(4,1);
thrust::transform(X.begin(), X.end(), X.begin(), functor(&g));
}

这个想法是我可以将任何函数传递给函子，因此我可以将该函数应用于 Vector 中的每个元素。不幸的是，我不确定为什么会出现所描述的错误。我用-w -O3 -shared -arch=sm_20 -std=c++11 -DTHRUST_DEBUG编译

我很感谢你们能给我的任何帮助:)

__device__函数

的地址（或__host__ __device__）不能在主机代码中获取，以便在设备上使用：

thrust::transform(X.begin(), X.end(), X.begin(), functor(&g));
                                                         ^
                                                     You will not get the 
                                                     __device__ function
                                                     address here

关于堆栈溢出有很多问题，这些问题讨论了通过内核调用传递的 CUDA 设备函数地址的使用。这个答案链接到几个可能感兴趣的。

解决此问题的一种可能方法是在设备代码中获取设备函数地址，并将其传递给主机，以便像您描述的那样使用：

$ cat t1057.cu
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <iostream>
struct functor{
functor(float (*g)(const float&)) : _g{g} {}
__host__ __device__ float operator()(const float& x) const {
        return _g(x);
    }
private:
    float (*_g)(const float&);
};
__host__ __device__ float g(const float& x){return 3*x;}
__device__ float (*d_g)(const float&) = g;
int main(void){
float (*h_g)(const float&) = NULL;
cudaMemcpyFromSymbol(&h_g, d_g, sizeof(void *));
thrust::device_vector<float> X(4,1);
thrust::transform(X.begin(), X.end(), X.begin(), functor(h_g));
thrust::copy_n(X.begin(), X.size(), std::ostream_iterator<float>(std::cout, ","));
std::cout << std::endl;
}
$ nvcc -o t1057 t1057.cu -std=c++11
$ ./t1057
3,3,3,3,
$

另一种可能的方法，利用 @m.s. 在这里总是聪明的工作使用模板：

$ cat t1057.cu
#include <thrust/device_vector.h>
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <iostream>
typedef float(*fptr_t)(const float&);
template <fptr_t F>
struct functor{
  __host__ __device__ float operator()(const float& x) const {
        return F(x);
    }
};
__host__ __device__ float g(const float& x){return 3*x;}

int main(void){
thrust::device_vector<float> X(4,1);
thrust::transform(X.begin(), X.end(), X.begin(), functor<g>());
thrust::copy_n(X.begin(), X.size(), std::ostream_iterator<float>(std::cout, ","));
std::cout << std::endl;
}
$ nvcc -o t1057 t1057.cu -std=c++11
$ ./t1057
3,3,3,3,
$

检查 CUDA 的 cudaMemcpyFromSymbol 如何工作也很有帮助？.

cudafe（前端）创建一个普通的全局变量，如 C 和 CUDA 特定的 PTX 变量。使用全局 C 变量，以便主机程序可以通过其地址引用变量，PTX 变量用于变量的实际存储。
主机
变量的存在还允许主机编译器成功解析程序。当设备程序执行时，当它按名称操作变量时，它会对 PTX 变量进行操作。

基本上，host 和 device 具有不同的地址空间。您不能同时使用两者。也就是说，您只能使用设备上device空间的功能指针，而不是host