阻止执行，直到通过MPI_Comm_spawn调用的子级完成为止

Block execution until children called via MPI_Comm_spawn have finished

本文关键字：调用 spawn Comm MPI 执行更新时间：2023-10-16

我正在修改一个现有的应用程序，我想在其中生成一个动态创建的bash脚本。我创建了一个简单的包装程序，它以bash脚本的名称作为参数。在包装器中，脚本由MPI_Comm_spawn派生。紧接着，包装器调用MPI_Finalize，它在脚本完成之前执行：

#include "mpi.h"
#include <stdlib.h>
#include <iostream>
using namespace std;
int main(int argc, char *argv[])
{
    char *script = argv[1];
    int maxProcs = 2, myRank;
    MPI_Comm childComm;
    int spawnError[maxProcs];
    // Initialize
    argv[1] = NULL;
    MPI_Init(&argc, &argv);
    // Rank of parent process
    MPI_Comm_rank(MPI_COMM_WORLD, &myRank);    
    // Spawn application    
    MPI_Comm_spawn(script, MPI_ARGV_NULL, maxProcs, MPI_INFO_NULL, myRank, MPI_COMM_SELF, &childComm, spawnError);
    // Finalize
    MPI_Finalize();
    return EXIT_SUCCESS;
}

如果我插入

    sleep(10);

之前

    MPI_Finalize ();

一切都很好。现在我的问题是，是否可以在bash脚本完成之前阻止包装器中的执行？此外，如果能获得脚本的返回值，那就太好了。不幸的是，不能为脚本创建另一个包装器，该包装器通过系统调用与父包装器通信并执行bash脚本，因为我需要从脚本中访问MPI环境变量。我希望，我已经把事情说清楚了。如有任何帮助，我们将不胜感激！

如果您可以控制bash脚本的内容，也就是说，如果您可以在生成之前将一些东西放入其中，那么一个非常粗糙的选项是编写一个包含单个MPI_Barrier行的特殊MPI程序：

#include <mpi.h>
int main (int argc, char **argv)
{
   MPI_Comm parent;
   MPI_Init(&argc, &argv);
   // Obtain an intercommunicator to the parent MPI job
   MPI_Comm_get_parent(&parent);
   // Check if this process is a spawned one and if so enter the barrier
   if (parent != MPI_COMM_NULL)
      MPI_Barrier(parent);
   MPI_Finalize();
   return 0;
}

将该程序编译为与主MPI程序使用的MPI分布相同的任何其他MPI程序，并将其称为类似waiter的程序。然后在bash脚本的开头设置一个EXIT陷阱：

#!/bin/bash
trap "/path/to/waiter $*" EXIT
...
# End of the script file

同时将主程序修改为：

// Spawn application    
MPI_Comm_spawn(script, MPI_ARGV_NULL, maxProcs, MPI_INFO_NULL, myRank, MPI_COMM_SELF, &childComm, spawnError);
// Wait for the waiters to enter the barrier
MPI_Barrier(childComm);
// Finalize
MPI_Finalize();

重要的是，waiter在陷阱中像waiter $*一样被调用，这样它就可以接收bash脚本将接收的所有命令行参数，因为一些旧的MPI实现会向派生的可执行文件附加额外的参数，以便为其提供父连接信息。符合MPI-2的实现通常通过环境提供这些信息，以便支持MPI_Init(NULL, NULL)。

其工作方式非常简单：trap命令指示shell在脚本退出时执行waiter。waiter本身只是与父MPI作业建立一个内部通信程序，并在屏障上等待。一旦所有派生的脚本都完成了，所有脚本都会作为退出陷阱的一部分启动服务程序，障碍就会被解除。

如果不能修改脚本，那么只需创建一个包装器脚本，该脚本调用实际的脚本，并将服务生放入包装器中。

经过测试，可与Open MPI和Intel MPI配合使用。

据我所知，没有一种方法可以制作MPI_COMM_SPAWN块，这里的常见解决方案是在spawner和spawnee之间设置MPI_BARRIER。不幸的是，这里没有遵循MPI应用程序生成另一个MPI应用程序的常见模型。相反，您只是在运行一堆脚本。为了获得您想要的结果，您可能需要使用MPI之外的东西，或者想办法为您的远程bash脚本编写MPI包装器。

为什么不使用一个子MPI应用程序来实际执行带有fork-plus exec的脚本呢。脚本名称可以作为参数传递给使用MPI_Comm_s典当或MPI_Comm_S典当_multiple创建的子项。然后，这些子级通过等待来等待脚本完成，或者如果出现错误，则通过处理SIGCHLD来等待。脚本完成后，可以在父进程和子进程MPI之间输入一个屏障，然后通过调用MPI_Finalize终止。

儿童项目将与Hristo Iliev提出的项目类似：

#include <mpi.h>
int main (int argc, char **argv){
   MPI_Comm parent;
   MPI_Init(&argc, &argv);
   MPI_Comm_get_parent(&parent);
   pid = fork();  
   if (pid < 0) { // error while forking
       exit (-1);
   } else if (pid == 0) { // child
       execvp(<nome of the script parsed from parameters in argv or other means>);
   } else { // parent
       wait(<pid of the child>); // there are non-blocking alternatives if needed
   }
   if (parent != MPI_COMM_NULL){
      MPI_Barrier(parent);
   }
   MPI_Finalize();
   return 0;
}

父程序只需发出spawn（如果有单个脚本名称）或spawn_multile（如果每个派生的MPI进程都有不同的脚本名称），然后用派生的子进程的intercommunicator（MPI派生操作的输出参数）设置屏障。