我发现自己无法理解MPI_Gatherv的参数"recvcounts"

I found myself cannot understood the parameter "recvcounts" of MPI_Gatherv

本文关键字：Gatherv 参数 recvcounts MPI 自己发现更新时间：2023-10-16

MPI_Gatherv是MPI的一个接口:

int MPI_Gatherv(
    void* sendbuf,
    int sendcount,
    MPI_Datatype sendtype,
    void* recvbuf,
    int *recvcounts,
    int *displs,
    MPI_Datatype recvtype,
    int root,
    MPI_Comm comm)

"recvcounts"的类型是"int *"，这样我们就可以分别设置每个进程要接收的项的计数;然而，我发现这是不可能实现的:

when recvcounts[i] <Sendcount，根进程将只接收Sendcount项;>

当recvcounts[i]> sendcount时，程序将崩溃，错误消息如下:

Fatal error in PMPI_Gatherv: Message truncated, error stack:
PMPI_Gatherv(386).....: MPI_Gatherv failed(sbuf=0012FD34, scount=2, MPI_CHAR, rbuf=0012FCC8, rcnts=0012FB30, displs=0012F998, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
MPIR_Gatherv_impl(199):
MPIR_Gatherv(103).....:
MPIR_Localcopy(332)...: Message truncated; 2 bytes received but buffer size is 1

所以这意味着根必须从每个进程接收固定数量的项目，参数recvcount是没有意义的?还是我误解了什么?

下面是我的代码:

#include <mpi.h>
#include <iostream>
int main(int argc, char* argv[])
{
    MPI_Init(&argc, &argv);
    int n, id;
    MPI_Comm_size(MPI_COMM_WORLD, &n);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    char x[100], y[100];
    memset(x, '0' + id, sizeof(x));
    memset(y, '%', sizeof(y));
    int cnts[100], offs[100] = {0};
    for (int i = 0; i < n; i++)
    {
        cnts[i] = i + 1;
        if (i > 0)
        {
            offs[i] = offs[i - 1] + cnts[i - 1];
        }
    }
    MPI_Gatherv(x, 1, MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);    // receive only 1 item from each process
    //MPI_Gatherv(x, 2, MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);    // crash
    if (id == 0)
    {
        printf("Gatherv:n");
        for (int i = 0; i < 100; i++)
        {
            printf("%c ", y[i]);
        }
        printf("n");
    }
    MPI_Finalize();
    return 0;
}

正如@Alexander Molodih指出的，sendcount=recvcount, sendtype=recvtype总是有效的;但是当你开始创建自己的MPI类型时，你通常有不同的发送和接收类型，这就是为什么recvcount可能与sendcount不同。

作为一个例子，看看最近问的MPI分区矩阵成块;有一个二维数组被分解成块和分散。在这里，发送类型(必须只从全局数组中挑选出必要的数据)和接收类型(只是一个连续的数据块)是不同的，计数也是不同的。

这就是为什么发送和接收类型和计数不同的一般原因，在sendrecv, gather/scatter或任何其他同时发生发送和接收的操作中。

在您的gatherv示例中，每个进程可能有自己不同的sendcount，但是recvcount[]数组必须是所有这些计数的列表，以便接收方可以正确放置接收到的数据。如果您事先不知道这些值(每个秩只知道它自己的计数，cnts[id])，您可以先进行收集:

MPI_Gather(&(cnts[id]), 1, MPI_INT, cnts, 1, MPI_INT, 0, MPI_COMM_WORLD):
for (int i = 1; i < n; i++) { 
    offs[i] = offs[i - 1] + cnts[i - 1];
}
MPI_Gatherv(x, cnts[id], MPI_CHAR, y, cnts, offs, MPI_CHAR, 0, MPI_COMM_WORLD);