c++ 11在CUDA中的别名模板

C++11 alias templates in CUDA

本文关键字：别名 CUDA c++ 更新时间：2023-10-16

关键问题是CUDA编译器是否支持别名模板?

我在Ubuntu上使用CUDA 7.5和gcc-4.8。我所有的模板类都是在头文件和#include中定义的，在编译过程中被定义为一个翻译单元。

我有一个简单的cuda_array类，它提供了一个围绕std::vector的薄包装器。它本质上是thrust::host_vector和thrust::device_vector结合的一个非常简单的版本。它的声明是

template <typename T, const size_t N>
class cuda_array {
    std::vector<T> host;
    T *device;
public:
    // lots of type aliases to meet container requirements
    void push() { /* cudaMemcpy(...,H2D); */ }
    void pull() { /* cudaMemcpy(...,D2H); */ }
    // a few others that aren't relevant here
};

为了制作一个矩阵，我只是做了一个快速的模板别名。

template <typename T, const size_t M, const size_t N>
using cuda_matrix = cuda_array<T, M * N>;

我想将我的矩阵向量乘法CUDA内核映射到重载的operator*上，以确保类型安全和易于使用(留给调用者以确保push和pull被正确调用)。

template <typename T, const size_t rows, const size_t cols>
__global__ void matrix_vector_mul(T *A, T *b, T *result) {
     __shared__ T shared_b[cols];
    // rest of it
}
template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v) {
    cuda_array<T, M> result;
    matrix_vector_mul<T, M, N><<<16, 32>>>(m.device_data(), v.device_data(), result.device_data());
    return result;
}

在我的'main.cpp'中，我添加了

cuda_matrix<int,16,32> A;
cuda_array<int,32> b;
auto result = A * b;

最后一行抛出错误

error: no operator "*" matches these operands
        operand types are: cuda_matrix<int, 16UL, 32UL> * cuda_array<int, 32UL>

我查找了所有我能想到的模板类型演绎错误，但都没有成功。无奈之下，我将cuda_matrix别名模板转换为模板类。

template <typename T, const size_t M, const size_t N>
class cuda_matrix : public cuda_array<T, M * N> {};

编译错误消失了!因此，CUDA似乎还不支持别名模板。还是我做了什么我自己都不知道的蠢事?

你必须记住:

§14.5.7 [temp.alias]/p2:

当template-id引用别名模板的专门化时，它等同于关联的类型通过将别名的类型id中的模板参数替换为其模板参数来获得模板。[注意:永远不会推导别名模板名。- 结束说明]

这意味着不执行演绎:

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_matrix<T, M, N> &m, cuda_array<T, N> &v)

但:

template <typename T, const size_t M, const size_t N>
__host__ cuda_array<T, M> operator*(cuda_array<T, M * N> &m, cuda_array<T, N> &v)
//                                  ~~~~~~~~~~~~~~~~~~~^

所以:

§14.8.2.5 [temp. deduction .type]/p16:

如果，在带有非类型模板形参的函数模板声明中，非类型模板参数用于函数形参列表中的子表达式，该表达式是一个非推导的上下文如上所述。

M在一个不可演绎的上下文中，因此这个operator*不被认为是可行的重载。

作为一种变通方法，您可以验证cuda_array本身的推导值:

template <typename T, std::size_t MN, std::size_t N>
auto operator*(const cuda_array<T, MN>& m, const cuda_array<T, N>& v)
    -> typename std::enable_if<(MN/N)*N==MN, cuda_array<T, MN/N>>::type;

或者使用已有的继承技巧;则M和N分别为cuda_matrix的非类型模板参数