MPI C++矩阵添加、函数参数和函数返回

MPI C++ matrix addition, function arguments, and function returns

本文关键字：函数参数返回添加 C++ MPI 更新时间：2023-10-16

在过去的两年里，我一直在互联网上学习C++，最终我需要深入研究MPI。我一直在浏览stackoverflow和其他互联网(包括http://people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html和https://computing.llnl.gov/tutorials/mpi/#LLNL)。我想我已经有了一些逻辑，但我很难理解以下内容：

#include (stuff)
using namespace std;
vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows);
int main(int argc, char** argv)
{
    vector<double> result;//represents a regular 1D vector
    int id_proc, tot_proc, root_proc = 0;
    int dim;//set to number of "columns" in A and B below
    int rows;//set to number of "rows" of A and B below
    vector<double> A(dim*rows), B(dim*rows);//represent matrices as 1D vectors
    MPI::Init(argc,argv);
    id_proc = MPI::COMM_WORLD.Get_rank();
    tot_proc = MPI::COMM_WORLD.Get_size();
    /*
    initialize A and B here on root_proc with RNG and Bcast to everyone else
    */
    //allow all processors to call function() so they can each work on a portion of A
    result = function(A,B,dim,rows);
    //all processors do stuff with A
    //root_proc does stuff with result (doesn't matter if other processors have updated result)
    MPI::Finalize();
    return 0;
}
vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows)
{
    /*
    purpose of function() is two-fold:
    1. update foo because all processors need the updated "matrix"
    2. get the average of the "rows" of foo and return that to main (only root processor needs this)
    */
    vector<double> output(dim,0);
    //add matrices the way I would normally do it in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            foo[i*dim + j] += bar[i*dim + j];//perform "matrix" addition (+= ON PURPOSE)
        }
    }
    //obtain average of rows in foo in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            output[j] += foo[i*dim + j];//sum rows of A
        }
    }
    for (int j = 0; j < dim; j++)
    {
            output[j] /= rows;//divide to obtain average
    }
    return output;        
}

上面的代码只是为了说明这个概念。我主要关心的是并行化矩阵加法，但让我感到困惑的是：

1( 如果每个处理器只处理该循环的一部分(当然，我必须修改每个处理器的循环参数(，我该用什么命令将a的所有部分合并回所有处理器内存中的单个更新的a。我的猜测是，我必须执行某种Alltoall，每个处理器将其A部分发送给所有其他处理器，但我如何保证(例如(处理器3处理的第3行覆盖其他处理器的第3列，而不是意外地覆盖第1行。

2( 如果我在函数((中使用Alltoall，是否必须允许所有处理器进入函数((，或者我可以使用…隔离函数((。。。

if (id_proc == root_proc)
{
    result = function(A,B,dim,rows);
}

…然后内部函数((处理所有并行化。尽管听起来很傻，但我正试图在一个处理器上做很多工作(使用广播(，并将耗时的循环并行化。只是想让代码在概念上保持简单，这样我就可以得到我的结果并继续前进

3( 对于平均部分，我确信如果我想并行化它，我可以使用reduced命令，对吗？

此外，顺便说一句：有没有一种方法可以调用Bcast((，使其处于阻塞状态？我想用它来同步我的所有处理器(boost库不是一个选项(。如果没有，我就选巴里((。感谢您对这个问题的回答，感谢stackoverflow社区在过去两年里学习我如何编程！：(

1(您要查找的函数是MPI_Allcollecte。MPI_Allcollecte将允许您从每个处理器发送一行，并在所有处理器上接收结果。

2( 是的，您可以在函数中使用一些处理器。由于MPI函数与通信器一起工作，因此必须为此创建一个单独的通信器。我不知道这是如何在C++绑定中实现的，但C绑定使用MPI_Comm_create函数。

3( 是，请参阅MPI_Allreduce。

side:Bcast阻止一个进程，直到分配给该进程的发送/接收操作完成。如果你想等待所有处理器完成它们的工作(我不知道你为什么要这样做(，你应该使用Barrier((。

额外注意：我不建议使用C++绑定，因为它们已经贬值，而且您找不到关于如何使用它们的具体示例。如果您想要C++绑定，Boost MPI是要使用的库，但它并没有涵盖所有MPI函数。