MPI_Irecv不接收所有发送
MPI_Irecv does not receive all sends?
我试图在这个简化的代码中实现的是:
- 两种类型的进程(根进程和子进程,ids/rank=10和0-9)
- 初始化:
- root将听孩子们"完成">
- 所有操作完成后,子级将侦听根通知
- 虽然没有赢家(还没有全部完成):
- 孩子们将有20%的机会完成任务(并通知root他们完成了任务)
- root将检查是否已完成所有操作
- 如果全部完成:向"获胜者"的孩子发送通知
我有这样的代码:
int numprocs, id, arr[10], winner = -1;
bool stop = false;
MPI_Request reqs[10], winnerNotification;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
for (int half = 0; half < 1; half++) {
for (int round = 0; round < 1; round++) {
if (id == 10) { // root
// keeps track of who has "completed"
fill_n(arr, 10, -1);
for (int i = 0; i < 10; i++) {
MPI_Irecv(&arr[i], 1, MPI_INT, i, 0, MPI_COMM_WORLD, &reqs[i]);
}
} else if (id < 10) { // children
// listen to root of winner notification/indication to stop
MPI_Irecv(&winner, 1, MPI_INT, 10, 1, MPI_COMM_WORLD, &winnerNotification);
}
while (winner == -1) {
//cout << id << " is in loop" << endl;
if (id < 10 && !stop && ((rand() % 10) + 1) < 3) {
// children has 20% chance to stop (finish work)
MPI_Send(&id, 1, MPI_INT, 10, 0, MPI_COMM_WORLD);
cout << id << " sending to root" << endl;
stop = true;
} else if (id == 10) {
// root checks number of children completed
int numDone = 0;
for (int i = 0; i < 10; i++) {
if (arr[i] >= 0) {
//cout << "root knows that " << i << " has completed" << endl;
numDone++;
}
}
cout << "numDone = " << numDone << endl;
// if all done, send notification to players to stop
if (numDone == 10) {
winner = 1;
for (int i = 0; i < 10; i++) {
MPI_Send(&winner, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
}
cout << "root sent notification of winner" << endl;
}
}
}
}
}
MPI_Finalize();
调试cout
的输出看起来像:问题似乎是root没有收到所有子级的完成通知?
2 sending to root
3 sending to root
0 sending to root
4 sending to root
1 sending to root
8 sending to root
9 sending to root
numDone = 1
numDone = 1
... // many numDone = 1, but why 1 only?
7 sending to root
...
我想也许我无法接收到数组:但我尝试了
if (id == 1) {
int x = 60;
MPI_Send(&x, 1, MPI_INT, 0, 0, MPI_COMM_WORLD);
} else if (id == 0) {
MPI_Recv(&arr[1], 1, MPI_INT, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
cout << id << " recieved " << arr[1] << endl;
}
这是有效的。
更新
如果我在while循环结束之前添加一个MPI_Barrier(MPI_COMM_WORLD)
,这似乎可以解决,但为什么呢?即使进程不同步,最终,孩子们也会向root发送他们已经完成的信息,root应该"倾听"并进行相应的处理?似乎正在发生的是root一直在运行,占用了孩子们执行的所有资源?或者这里发生了什么?
更新2:一些孩子没有收到来自root的通知
好吧,现在root没有收到孩子们通过@MichaelSh的回答完成的通知的问题,我关注的是孩子们没有收到家长的通知。这里有一个代码再现了这个问题:
int numprocs, id, arr[10], winner = -1;
bool stop = false;
MPI_Request reqs[10], winnerNotification;
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
MPI_Comm_rank(MPI_COMM_WORLD, &id);
srand(time(NULL) + id);
if (id < 10) {
MPI_Irecv(&winner, 1, MPI_INT, 10, 0, MPI_COMM_WORLD, &winnerNotification);
}
MPI_Barrier(MPI_COMM_WORLD);
while (winner == -1) {
cout << id << " is in loop ..." << endl;
if (id == 10) {
if (((rand() % 10) + 1) < 2) {
winner = 2;
for (int i = 0; i < 10; i++) {
MPI_Send(&winner, 1, MPI_INT, i, 0, MPI_COMM_WORLD);
}
cout << "winner notifications sent" << endl;
}
}
}
cout << id << " b4 MPI_Finalize. winner is " << winner << endl;
MPI_Finalize();
输出看起来像:
# 1 run
winner notifications sent
10 b4 MPI_Finalize. winner is 2
9 b4 MPI_Finalize. winner is 2
0 b4 MPI_Finalize. winner is 2
# another run
winner notifications sent
10 b4 MPI_Finalize. winner is 2
8 b4 MPI_Finalize. winner is 2
注意到有些进程似乎没有从父进程获得通知吗?为什么子进程的MPI_Wait
会挂起它们?那么我该如何解决这个问题呢?
还有
在您的案例中,所有
MPI_Barrier
都会这样做——它会等待子响应完成。请检查我的答案以获得更好的解决方案
如果我不这样做,我想每个孩子的响应只需要几毫秒?所以,即使我不等待/设置障碍,我也希望在发送后不久仍能收到?除非进程最终占用资源,而其他进程不运行?
请尝试此代码块(为了简单起见,省略了错误检查):
...
// root checks number of children completed
int numDone = 0;
MPI_Status statuses[10];
MPI_Waitall(10, reqs, statuses);
for (int i = 0; i < 10; i++) {
...
编辑更好的解决方案:
每个子级都会启动root获胜者通知接收,并将其通知发送给root
Root向数组发起获胜者通知接收,并等待接收到所有通知,然后将获胜者的id发送给子级。在for (int round = 0; round < 1; round++)
后插入此代码
if (id == 10)
{ // root
// keeps track of who has "completed"
memset(arr, -1, sizeof(arr));
for (int i = 0; i < 10; i++)
{
MPI_Irecv(&arr[i], 1, MPI_INT, i, 0, MPI_COMM_WORLD, &reqs[i]);
}
}
else if (id < 10)
{ // children
// listen to root of winner notification/indication to stop
MPI_Irecv(&winner, 1, MPI_INT, 10, 1, MPI_COMM_WORLD, &winnerNotification);
}
if (id < 10)
{
while(((rand() % 10) + 1) < 3) ;
// children has 20% chance to stop (finish work)
MPI_Send(&id, 1, MPI_INT, 10, 0, MPI_COMM_WORLD);
std::cout << id << " sending to root" << std::endl;
// receive winner notification
MPI_Status status;
MPI_Wait(&winnerNotification, &status);
// Process winner notification
}
else if (id == 10)
{
MPI_Status statuses[10];
MPI_Waitall(10, reqs, statuses);
// if all done, send notification to players to stop
{
winner = 1;
for (int i = 0; i < 10; i++)
{
MPI_Send(&winner, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
}
std::cout << "root sent notification of winner" << std::endl;
}
}
相关文章:
- 用MacOS Mojave编译C++:致命错误:mpi.h:没有这样的文件或目录
- MPI突然停止了对多个核心的操作
- 设置 Visual Studio for MPI: 找不到标识符错误
- 使用 make 编译 MPI,几个命名空间错误,例如"错误:未知类型名称'使用'?
- 如何使用 MPI 的远程内存访问 (RMA) 功能并行化数据聚合?
- 重载 MPI 中的运算符 ()
- MPI:检查是否有任何进程已终止
- 使用 pybind11 共享 MPI 通信器
- 使用 CMake,Microsoft MPI 和 Visual Studio 2017 找不到 mpi.h
- 在具有 MPI 的超立方体中广播
- 通过 mpi 发送 c++ 标准::矢量<bool>
- 使用 MPI 的 C++ 中的并行 for 循环
- 如何将 OpenMP 和 MPI 导入到大型 CLion CMake 项目中?
- 如何通过Boost.MPI发送2d Boost.MultiArray的子阵列?
- HDF5 构建了并行支持,但找不到特定于 mpi 的功能
- MPI 集合通信中的指针分配
- 仅特定内核计数上的 MPI 内存损坏
- MPI Isend and Irecv problems
- boost::mpi在具有相同标记的多个isend/irecv传输上抛出mpi_ERR_TRUNCATE
- boost::MPI 的 irecv() 返回未初始化的状态对象