从传入的FORTRAN数组生成CUSP coo_matrix

Generating CUSP coo_matrix from passed FORTRAN arrays

本文关键字:CUSP coo matrix 数组 FORTRAN      更新时间:2023-10-16

我正致力于将CUSP求解器集成到现有的FORTRAN代码中。作为第一步,我只是试图从FORTRAN传递一对整数数组和一个浮点数(FORTRAN中的real*4),这将用于构建然后打印COO格式CUSP矩阵。

到目前为止,我已经能够遵循这个线程,并得到编译和链接的一切:使用IFORT与nvcc和CUSP

未解决的引用

不幸的是,程序显然是在向CUSP矩阵发送垃圾,并最终崩溃,出现以下错误:

$./fort_cusp_test
 testing 1 2 3
sparse matrix <1339222572, 1339222572> with 1339222568 entries
libc++abi.dylib: terminating with uncaught exception of type thrust::system::system_error: invalid argument
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0  0x10ff86ff6
#1  0x10ff86593
#2  0x7fff8593ff19
Abort trap: 6

cuda和fortran源代码的代码如下:

cusp_runner.cu

#include <stdio.h>
#include <cusp/coo_matrix.h>
#include <iostream>
#include <cusp/krylov/cg.h>
#include <cusp/print.h>
#if defined(__cplusplus)
extern "C" {
#endif
void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {
   //wrap raw input pointers with thrust::device_ptr
   thrust::device_ptr<int> wrapped_device_I(row_i);
   thrust::device_ptr<int> wrapped_device_J(col_j);
   thrust::device_ptr<float> wrapped_device_V(val_v);
   //use array1d_view to wrap individual arrays
   typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
   typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;
   DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
   DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
   DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);
   //combine array1d_views into coo_matrix_view
   typedef   cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;
   //construct coo_matrix_view from array1d_views
   DeviceView A(n,n,nnz,row_indices,column_indices,values);
   cusp::print(A);
}
#if defined(__cplusplus)
}
#endif

fort_cusp_test.f90

program fort_cuda_test
   implicit none
interface
   subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
      use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
      implicit none
      integer(C_INT) :: n, nnz, row_i(:), col_j(:)
      real(C_FLOAT) :: val_v(:)
   end subroutine test_coo_mat_print_
end interface
   integer*4   n
   integer*4   nnz
   integer*4, target :: rowI(9),colJ(9)
   real*4, target :: valV(9)
   integer*4, pointer ::   row_i(:)
   integer*4, pointer ::   col_j(:)
   real*4, pointer ::   val_v(:)
   n     =  3
   nnz   =  9
   rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
   colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
   valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)
   row_i => rowI
   col_j => colJ
   val_v => valV
   write(*,*) "testing 1 2 3"
   call test_coo_mat_print_(row_i,col_j,val_v,n,nnz)
end program fort_cuda_test

如果你想自己试一试,这是我的makefile(相当不优雅):

Test:
   nvcc -Xcompiler="-fPIC" -shared cusp_runner.cu -o cusp_runner.so -I/Developer/NVIDIA/CUDA-6.5/include/cusp
   gfortran -c fort_cusp_test.f90
   gfortran fort_cusp_test.o cusp_runner.so -L/Developer/NVIDIA/CUDA-6.5/lib -lcudart -o fort_cusp_test
clean:
   rm *.o *.so

库路径当然需要适当地修改。

谁能指出我在正确的方向如何正确地传递所需的数组从fortran代码?


删除接口块并在C函数开始处添加print语句后,我可以看到数组被正确传递,但是n和nnz引起了问题。我得到以下输出:

$ ./fort_cusp_test
 testing 1 2 3
n: 1509677596, nnz: 1509677592
     i,  row_i,  col_j,        val_v
     0,      1,      1,   1.0000e+00
     1,      1,      2,   2.0000e+00
     2,      1,      3,   3.0000e+00
     3,      2,      1,   4.0000e+00
     4,      2,      2,   5.0000e+00
     5,      2,      3,   6.0000e+00
     6,      3,      1,   7.0000e+00
     7,      3,      2,   8.0000e+00
     8,      3,      3,   9.0000e+00
     9,      0,  32727,   0.0000e+00
    ...
    etc
    ...
    Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0  0x105ce7ff6
#1  0x105ce7593
#2  0x7fff8593ff19
#3  0x105c780a2
#4  0x105c42dbc
#5  0x105c42df4
Segmentation fault: 11

fort_cusp_test

    interface
       subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
          use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
          implicit none
          integer(C_INT),value :: n, nnz
          integer(C_INT) :: row_i(:), col_j(:)
          real(C_FLOAT) :: val_v(:)
       end subroutine test_coo_mat_print_
    end interface
       integer*4   n
       integer*4   nnz
       integer*4, target :: rowI(9),colJ(9)
       real*4, target :: valV(9)
       integer*4, pointer ::   row_i(:)
       integer*4, pointer ::   col_j(:)
       real*4, pointer ::   val_v(:)
       n     =  3
       nnz   =  9
       rowI =  (/ 1, 1, 1, 2, 2, 2, 3, 3, 3/)
       colJ =  (/ 1, 2, 3, 1, 2, 3, 1, 2, 3/)
       valV =  (/ 1, 2, 3, 4, 5, 6, 7, 8, 9/)
       row_i => rowI
       col_j => colJ
       val_v => valV
       write(*,*) "testing 1 2 3"
       call test_coo_mat_print_(rowI,colJ,valV,n,nnz)
    end program fort_cuda_test

cusp_runner.cu

   #include <stdio.h>
    #include <cusp/coo_matrix.h>
    #include <iostream>
    // #include <cusp/krylov/cg.h>
    #include <cusp/print.h>
    #if defined(__cplusplus)
    extern "C" {
    #endif
    void test_coo_mat_print_(int * row_i, int * col_j, float * val_v, int n, int nnz ) {
       printf("n: %d, nnz: %dn",n,nnz);
       printf("%6s, %6s, %6s, %12s n","i","row_i","col_j","val_v");
       for(int i=0;i<n;i++) {
          printf("%6d, %6d, %6d, %12.4en",i,row_i[i],col_j[i],val_v[i]);
       }
       if ( false ) {
       //wrap raw input pointers with thrust::device_ptr
       thrust::device_ptr<int> wrapped_device_I(row_i);
       thrust::device_ptr<int> wrapped_device_J(col_j);
       thrust::device_ptr<float> wrapped_device_V(val_v);
       //use array1d_view to wrap individual arrays
       typedef typename cusp::array1d_view< thrust::device_ptr<int> > DeviceIndexArrayView;
       typedef typename cusp::array1d_view< thrust::device_ptr<float> > DeviceValueArrayView;
       DeviceIndexArrayView row_indices(wrapped_device_I, wrapped_device_I + n);
       DeviceIndexArrayView column_indices(wrapped_device_J, wrapped_device_J + nnz);
       DeviceValueArrayView values(wrapped_device_V, wrapped_device_V + nnz);
       //combine array1d_views into coo_matrix_view
       typedef cusp::coo_matrix_view<DeviceIndexArrayView,DeviceIndexArrayView,DeviceValueArrayView> DeviceView;
       //construct coo_matrix_view from array1d_views
       DeviceView A(n,n,nnz,row_indices,column_indices,values);
       cusp::print(A); }
    }
    #if defined(__cplusplus)
    }
    #endif

从Fortran传递参数到C例程有两种方法:第一种是使用接口块(在现代Fortran中是一种新方法),第二种是不使用接口块(一种即使在Fortran77中也有效的旧方法)。

首先,下面是关于使用接口块的第一种方法。因为C例程期望接收C指针(row_i, col_j和val_v),我们需要从Fortran端传递这些变量的地址。为此,我们必须在接口块中使用星号(*)而不是冒号(:),如下所示。(如果我们使用冒号,那么这会告诉Fortran编译器发送Fortran指针对象的地址[1],这不是期望的行为。)此外,由于C例程中的n和nnz被声明为值(而不是指针),接口块需要为这些变量具有VALUE属性,以便Fortran编译器发送n和nnz的值而不是它们的地址。总而言之,在第一种方法中,C和Fortran例程是这样的:

Fortran routine:
...
interface
    subroutine test_coo_mat_print_(row_i,col_j,val_v,n,nnz) bind(C)
        use, intrinsic :: ISO_C_BINDING, ONLY: C_INT,C_FLOAT
        implicit none
        integer(C_INT) :: row_i(*), col_j(*)
        real(C_FLOAT) :: val_v(*)
        integer(C_INT), value :: n, nnz     !! see note [2] below also
    end subroutine test_coo_mat_print_
end interface
...
call test_coo_mat_print_( rowI, colJ, valV, n, nnz )
C routine:
void test_coo_mat_print_ (int * row_i, int * col_j, float * val_v, int n, int nnz ) 

下面是关于没有接口块的第二种方法。在这种方法中,首先完全删除接口块和数组指针,并将Fortran代码修改如下

Fortran routine:
integer  rowI( 9 ), colJ( 9 ), n, nnz     !! no TARGET attribute necessary
real     valV( 9 )
! ...set rowI etc as above...
call test_coo_mat_print ( rowI, colJ, valV, n, nnz )   !! "_" is dropped

和C例程如下

void test_coo_mat_print_ ( int* row_i, int* col_j, float* val_v, int* n_, int* nnz_ )
{
    int n = *n_, nnz = *nnz_;
    printf( "%d %d n", n, nnz );
    for( int k = 0; k < 9; k++ ) {
        printf( "%d %d %10.6f n", row_i[ k ], col_j[ k ], val_v[ k ] );
    }
    // now go to thrust...
}

注意n_和nnz_在C例程中被声明为指针,因为没有接口块,Fortran编译器总是将实际参数的地址发送给C例程。还要注意,在上面的C例程中,将打印row_i等的内容,以确保正确传递参数。如果打印的值是正确的,那么我猜问题将更有可能在调用推力例程(包括如何传递大小信息,如n和nnz)。

[1]声明为"real, pointer:: a(:)"的Fortran指针实际上代表了类似于数组视图类(用c++术语来说)的东西,这与实际指向的数据不同。这里需要发送的是实际数据的地址,而不是这个数组视图对象的地址。此外,接口块中的星号(a(*))表示一个假定大小的数组,这是在Fortran中传递数组的旧方法。在本例中,如预期的那样,传递数组第一个元素的地址。

[2]如果n和nnz在C例程中被声明为指针(如第二种方法),则不应该附加这个VALUE属性,因为C例程需要实际参数的地址而不是它们的值。