将char数组传递给CUDA内核

Passing char array to CUDA Kernel

本文关键字：CUDA 内核 char 数组更新时间：2023-10-16

我正试图将一个包含10000字的char数组从主函数中的txt文件中读取到CUDA内核函数。

单词从主机传输到设备，如下所示：

（主要功能代码：）

//.....
     const int text_length = 20;
     char (*wordList)[text_length] = new char[10000][text_length];
     char *dev_wordList;
     for(int i=0; i<number_of_words; i++)
     {
         file>>wordList[i];
         cout<<wordList[i]<<endl;
     }
     cudaMalloc((void**)&dev_wordList, 20*number_of_words*sizeof(char));
     cudaMemcpy(dev_wordList, &(wordList[0][0]), 20 * number_of_words * sizeof(char), cudaMemcpyHostToDevice);
    //Setup execution parameters
    int n_blocks = (number_of_words + 255)/256;
    int threads_per_block = 256;

    dim3 grid(n_blocks, 1, 1);
    dim3 threads(threads_per_block, 1, 1);

    cudaPrintfInit();
    testKernel<<<grid, threads>>>(dev_wordList);
    cudaDeviceSynchronize();
    cudaPrintfDisplay(stdout,true);
    cudaPrintfEnd();

（内核功能代码：）

__global__ void testKernel(char* d_wordList)
{
    //access thread id
    const unsigned int bid = blockIdx.x;
    const unsigned int tid = threadIdx.x;
    const unsigned int index = bid * blockDim.x + tid;
    cuPrintf("!! %c%c%c%c%c%c%c%c%c%c n" , d_wordList[index * 20 + 0],
                                            d_wordList[index * 20 + 1],
                                            d_wordList[index * 20 + 2],
                                            d_wordList[index * 20 + 3],
                                            d_wordList[index * 20 + 4],
                                            d_wordList[index * 20 + 5],
                                            d_wordList[index * 20 + 6],
                                            d_wordList[index * 20 + 7],
                                            d_wordList[index * 20 + 8],
                                            d_wordList[index * 20 + 9]);
}

有没有办法更容易地操纵它们？（我希望每个元素/位置都有一个单词）我尝试过<string>，但我不能在CUDA设备代码中使用它们。

cuPrintf("%sn", d_wordlist+(index*20));

应该工作吗？（如果您的字符串为零终止）

更新：

此行：

char (*wordList)[text_length] = new char[10000][text_length];

在我看来很奇怪。一般来说，指向char的指针数组会这样分配：

char** wordList = new char*[10000];
for (int i=0;i<10000;i++) wordList[i] = new char[20];

在这种情况下，wordList[i]将是指向字符串编号i的指针。

更新#2：

如果你需要将字符串存储为一个连续的块，并且你确信没有一个字符串超过text_length+1，那么你可以这样做：

char *wordList = new char[10000*text_length];
for(int i=0; i<number_of_words; i++)
     {
         file>>wordList+(i*text_length);
         cout<<wordList+(i*text_length)<<endl;
     }

在这种情况下，wordList+（i*text_length）将指向字符串编号i的开头，它将以0结尾，因为这就是您从文件中读取它的方式，并且您将能够按照此答案中指定的方式打印出来。但是，如果您的任何字符串比text_length-1长，您仍然会遇到问题。