逆冲压实作用;最佳实践和最快的方法

Stream compaction with Thrust; best practices and fastest way?

本文关键字：方法最佳作用更新时间：2023-10-16

我有兴趣移植一些现有的代码来使用thrust，看看我是否可以相对容易地在GPU上加速它。

我希望完成的是一个流压缩操作，其中只保留非零元素。根据下面的示例代码，我让这大部分工作。我不确定如何处理的部分是在压缩发生后处理d_res和h_res中的所有额外填充空间。

这个例子只使用了一个0-99序列，所有的偶数项都被设置为0。这只是一个例子，真正的问题将是一个一般的稀疏数组。

这里的答案对我帮助很大，尽管当涉及到读取数据时，大小是已知的常数:如何快速压缩与CUDA C稀疏数组?

我怀疑我可以通过计算d_src中0的数量来解决这个问题，然后只分配d_res为该大小，或者在压缩后进行计数，并且只复制那么多元素。这真的是正确的做法吗?

我有一种感觉，通过巧妙地使用迭代器或其他推力特性，可以很容易地解决这个问题。

#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/copy.h>
//Predicate functor
struct is_not_zero
{
    __host__ __device__
        bool operator()(const int x)
    {
        return (x != 0);
    }
};
using namespace std;
int main(void)
{
    size_t N = 100;
    //Host Vector
    thrust::host_vector<int> h_src(N);
    //Fill with some zero and some nonzero data, as an example
    for (int i = 0; i < N; i++){
        if (i % 2 == 0){
            h_src[i] = 0;
        }
        else{
            h_src[i] = i;
        }
    }
    //Print out source data
    cout << "Source:" << endl;
    for (int i = 0; i < N; i++){
        cout << h_src[i] << " ";
    }
    cout << endl;
    //copies to device
    thrust::device_vector<int> d_src = h_src;
    //Result vector
    thrust::device_vector<int> d_res(d_src.size());
    //Copy non-zero elements from d_src to d_res
    thrust::copy_if(d_src.begin(), d_src.end(), d_res.begin(), is_not_zero());
    //Copy back to host
    thrust::host_vector<int> h_res(d_res.begin(), d_res.end());
    //thrust::host_vector<int> h_res = d_res; //Or just this?
    //Show results
    cout << "h_res size is " << h_res.size() << endl;
    cout << "Result after remove:" << endl;
    for (int i = 0; i < h_res.size(); i++){
        cout << h_res[i] << " ";
    }
    cout << endl;
    return 0;
}

另外，我是一个使用推力的新手，所以如果上面的代码有任何明显的缺陷，违背了使用推力的推荐做法，请告诉我。

同样，速度总是令人感兴趣的。阅读一些不同的推力教程，似乎这里和那里的小改变可能会大大节省或浪费速度。所以，请让我知道是否有一个聪明的方法来加快这个速度。

您似乎忽略了copy_if返回一个迭代器，该迭代器指向从流压缩操作中复制的数据的末尾。所以只需要这样:

//copies to device
thrust::device_vector<int> d_src = h_src;
//Result vector
thrust::device_vector<int> d_res(d_src.size());
//Copy non-zero elements from d_src to d_res
auto result_end = thrust::copy_if(d_src.begin(), d_src.end(), d_res.begin(), is_not_zero());
//Copy back to host
thrust::host_vector<int> h_res(d_res.begin(), result_end);

这样做将h_res设置为只保存非零，并且只从流压缩的输出中复制非零。不需要额外的计算