库达的记忆竞赛

Memory Race in Cuda

本文关键字：竞赛记忆更新时间：2023-10-16

我有一个全局函数，它获取数组和数组索引。该函数需要在某个字典中找到一个单词，以及它在给定序列中的开始位置。

但我看到线程覆盖了结果。所以我想这是因为记忆竞赛。我能做什么？

__global__ void find_words(int* dictionary, int dictionary_size, int* indeces, 
int indeces_size, int *sequence, int sequence_size, 
int longest_word, int* devWords, int  *counter)
{   
    int id = blockIdx.x * blockDim.x + threadIdx.x;
    int start = id * (CHUNK_SIZE - longest_word); 
    int finish = start + CHUNK_SIZE;
    int word_index = -1;
    if (finish > sequence_size)
    {
        finish = sequence_size;
    }
    // search in a closed area
    while(start < finish)
    {
    find_word_in_phoneme_dictionary_kernel(dictionary, dictionary_size, 
            indeces, indeces_size, sequence, &word_index, start, finish);
    if(word_index >= 0 && word_index <= indeces[indeces_size-1])
    {
        devWords[*counter]   = word_index; 
        devWords[*counter+1] = start;      // index in sequence 
        *counter+=2;
        start += dictionary[word_index];
    }
    else
    {
        start++;
    }
}
__syncthreads();
}

我还尝试为每个线程创建自己的数组和计数器，以存储他的结果然后收集所有线程结果。但我不明白如何在 CUDA 中实现收集。有什么帮助吗？

我想问题是您的计数器被多个线程读取和递增。因此，多个线程将使用相同的计数器值作为数组中的索引。应改为使用 int atomicAdd(int* address, int val); 来递增计数器。代码如下所示：

int oldCounter = atomicAdd(counter, 2);
devWords[oldCounter]   = word_index;
devWords[oldCounter+1] = start;

请注意，我在访问数组之前递增了counter。 atomicAdd(...) 返回计数器的旧值，然后我用它来访问数组。但是，原子操作是序列化的，这意味着递增计数器不能并行运行。其余的代码仍在并行运行。