包含多个 .h 和 .cu 文件的静态库无法解析函数

Static library with multiple .h and .cu files can't resolve functions

本文关键字：函数静态 cu 文件包含多更新时间：2023-10-16

编译具有多个.h和.cu文件的静态库时，我得到了一个未解析的extern函数。下面是一个复制错误的简短示例。

看来我无法先让 Nsight Eclipse Edition 编译 extrafunctions.cu。在我的完整项目中，首先编译具有额外函数的文件，但它仍然抛出无法解决外部函数错误。

下面是此示例的输出：

**** Build of configuration Debug for project linkerror ****
make all 
Building file: ../cudatest.cu
Invoking: NVCC Compiler
nvcc -I/usr/local/cuda/include -G -g -O0 -gencode arch=compute_30,code=sm_30 -odir "" -M -o "cudatest.d" "../cudatest.cu"
nvcc --compile -G -I/usr/local/cuda/include -O0 -g -gencode arch=compute_30,code=compute_30 -gencode arch=compute_30,code=sm_30  -x cu -o  "cudatest.o" "../cudatest.cu"
../cudatest.cu(19): warning: variable "devInts" is used before its value is set
../cudatest.cu(19): warning: variable "devInts" is used before its value is set
ptxas fatal   : Unresolved extern function '_Z9incrementi'
make: *** [cudatest.o] Error 255
**** Build Finished ****

cudatest.h：

#ifndef CUDAPATH_H_
#define CUDAPATH_H_
#include <cuda.h>
#include <cuda_runtime.h>
#include "extrafunctions.h"
void test();

#endif /* CUDAPATH_H_ */

cudatest.cu：

#include <cuda.h>
#include <cuda_runtime.h>
#include "extrafunctions.h"
__global__ void kernel(int* devInts){
    int tid = threadIdx.x + (blockDim.x*blockIdx.x);
    if (tid == 0){
        for(int i = 0; i < NUMINTS; i++){
            devInts[i] = increment(devInts[i]);
        }
    }
}
void test(){
    int* myInts = (int*)malloc(NUMINTS * sizeof(int));
    int* devInts;
    cudaMemcpy((void**)devInts, myInts, NUMINTS*sizeof(int), cudaMemcpyHostToDevice);
    kernel<<<1,1>>>(devInts);
    int* outInts = (int*)malloc(NUMINTS * sizeof(int));
    cudaFree(devInts);
    free(myInts);
    free(outInts);
}

额外功能.h：

#ifndef EXTRAFUNCTIONS_H_
#define EXTRAFUNCTIONS_H_
#include <cuda.h>
#include <cuda_runtime.h>
#define NUMINTS 4
int __device__ increment(int i);
#endif /* EXTRAFUNCTIONS_H_ */

extrafunctions.cu：

#include <cuda.h>
#include <cuda_runtime.h>
#include "extrafunctions.h"

int __device__ increment(int i){
    return i+1;
}

您需要显式启用单独的编译才能正常工作。右键单击您的项目，"属性"，构建>CUDA，然后选择"单独编译"链接器模式。

请注意，单独编译仅适用于 SM 2.0+ GPU，并且只能发出 SASS（例如，无法发出与未来 CUDA 设备兼容的 PTX）。有关更多信息，请阅读NVCC手册中的"在CUDA中使用单独的编译"。

更新您需要使用 NVCC 链接器来链接设备代码，这就是 GCC 链接器失败的原因。在 Nsight 中，您可以使用 NVCC 链接整个应用程序，也可以设置一个包含所有 CUDA 代码的静态库项目，并使用 NVCC 收费链和常规 C/C++ 项目构建，该项目使用 GCC 并与第一个项目生成的静态库链接。