切换到MPI_LONG_LONG_INT会崩溃

switching to MPI_LONG_LONG_INT crashes?

本文关键字：LONG INT 崩溃 MPI 更新时间：2023-10-16

我有以下代码，"所有收集"的值从所有进程进入nbodiesPerProc。

int nBodies = 10;
std::vector<int> nbodiesPerProc(m_processes);
int err = MPI_Allgather(&nBodies,1,MPI_INT,&nbodiesPerProc[0], 1, MPI_INT, m_comm);
ASSERTMPIERROR(err, "gather");

当我将代码更改为MPI_LONG_LONG_INT时，它开始崩溃:

std::size_t nBodies = 10;
static_assert( sizeof(std::size_t) == 8, "We send an 64bit integer");
std::vector<std::size_t> nbodiesPerProc(m_processes);
int err = MPI_Allgather(&nBodies,1,MPI_LONG_LONG_INT,&nbodiesPerProc[0]
                        ,1, MPI_LONG_LONG_INT, m_comm);
ASSERTMPIERROR(err, "gather");

有人知道吗?我需要注册MPI_LONG_LONG_INT吗?

崩溃:

[zfmgpu:17069] Signal: Segmentation fault (11)
[zfmgpu:17069] Signal code: Address not mapped (1)
[zfmgpu:17069] Failing at address: 0x10
[zfmgpu:17070] *** Process received signal ***
[zfmgpu:17070] Signal: Segmentation fault (11)
[zfmgpu:17070] Signal code: Address not mapped (1)
[zfmgpu:17070] Failing at address: 0x18
[zfmgpu:17067] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fe4626a6340]
[zfmgpu:17067] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x981c0) [0x7fe4609641c0]
[zfmgpu:17067] [ 2] /usr/lib/libmpi.so.1(+0x10362d) [0x7fe46162d62d]
[zfmgpu:17067] [ 3] /usr/lib/libmpi.so.1(ompi_datatype_sndrcv+0x502) [0x7fe46158e392]
[zfmgpu:17067] [ 4] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_allgather_intra_recursivedoubling+0x91) [0x7fe459aa0081]
[zfmgpu:17067] [ 5] /usr/lib/libmpi.so.1(PMPI_Allgather+0x179) [0x7fe46158f0c9]

Update:也MPI_UNSIGNED_LONG_LONG没有帮助，这将是正确的64位类型

发现错误:

m_processes was = 0

(愚蠢)