MPI 发送出现分段错误

MPI Send is giving segmentation fault

本文关键字:分段 错误 MPI      更新时间:2023-10-16

我正在尝试使用 MPI(boost( 运行遗传算法,其中我必须将序列化对象从等级 0 发送到所有其他等级。但是当我尝试发送数据时,我遇到了分段错误。

这是代码,输出和我得到的错误。

代码:问题恰恰出在world.send(0, 0, newP(;

int main (int argc, char** argv) 
{
    Population *pop = NULL;
    RuckSack r(true);
    int size, rank;
    Ga ga;
    namespace mpi = boost::mpi;
    mpi::environment env;
    mpi::communicator world;
    int countGeneration = 0;
    /* code */
    if (world.rank() == 0)
    {
        if (pop == NULL)
        {
            pop = new Population(60,true);
        }
    }
    for (int m = 0; m < 20; m++)
    {
        /* code */
        for (int i = 0; i< world.size(); i++)
        {
            world.send(i,0,pop);
        }

        world.recv(0, 0, pop);
        Population newP = *pop;

        newP = ga.evolvePopulation(newP, world.size());


        world.send(0, 0, newP);
    MPI_Finalize();
    return (EXIT_SUCCESS);
}

错误:

mpirun noticed that process rank 0 with PID 10336 on node user exited on signal 11 (Segmentation fault).

输出:

[user:10336] *** Process received signal ***
[user:10336] Signal: Segmentation fault (11)
[user:10336] Signal code: Address not mapped (1)
[user:10336] Failing at address: 0x31
[user:10336] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x35860)[0x7f1e93064860]
[user:10336] [ 1] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x14a24)[0x7f1e9409da24]
[user:10336] [ 2] /usr/lib/x86_64-linux-gnu/libboost_serialization.so.1.61.0(+0x15d11)[0x7f1e9409ed11]
[user:10336] [ 3] ./teste(+0x1de7c)[0x55ab4c07ae7c]
[user:10336] [ 4] ./teste(+0x1dd2c)[0x55ab4c07ad2c]
[user:10336] [ 5] ./teste(+0x1db3a)[0x55ab4c07ab3a]
[user:10336] [ 6] ./teste(+0x1d8eb)[0x55ab4c07a8eb]
[user:10336] [ 7] ./teste(+0x1d2da)[0x55ab4c07a2da]
[user:10336] [ 8] ./teste(+0x1cb20)[0x55ab4c079b20]
[user:10336] [ 9] ./teste(+0x1bed0)[0x55ab4c078ed0]
[user:10336] [10] ./teste(+0x1b47c)[0x55ab4c07847c]
[user:10336] [11] ./teste(+0x19741)[0x55ab4c076741]
[user:10336] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f1e9304f3f1]
[user:10336] [13] ./teste(+0x112aa)[0x55ab4c06e2aa]
[user:10336] *** End of error message ***

以下是一些疯狂的猜测:

  1. 您应该只在 rank0 进程上执行初始发送指令 - 现在您执行所有没有意义的进程(并且可能是问题的原因(
  2. 你不应该发送给"自己"。在 you 循环的第一次迭代中,rank0 会向自身发送,afaik 将阻止等待 recv 的进程。但是由于 rank0 被阻止,它永远不会到达"recv"行,并且将永远保持锁定状态。除此之外,进程将数据发送到自身也没有意义。

这些只是松散的建议,因为我在使用 MPI 方面的经验有限。希望对您有所帮助!