MPI_collective communication

MPI_collective communication

本文关键字:communication collective MPI      更新时间:2023-10-16

我尝试在 mpi 中编写快速排序代码。 并行化的算法很简单。 根以MPI_comm_world分散列表。 然后每个节点在其子数组上执行 qsort(( 函数。 MPI_gathers(( 用于将所有子数组返回给根,以便在 it.so 简单的情况下再次执行 qsort。 但是我收到错误。我猜也许子数组的大小不准确。因为它只是将列表的大小除以comm_size。因此,很可能会出现分段错误。但是,我给出了列表 1000 的大小和处理器的数量 4。除法的结果是250。所以应该没有分段错误。但是有。你能告诉我我错在哪里吗?

int main()
{
int array [1000];
int arrsize;
int chunk;
int* subarray;
int rank ;
int comm_size;
MPI_Init(NULL,NULL);
MPI_Comm_size(MPI_COMM_WORLD,&comm_size);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
if(rank==0)
{
time_t t;
srand((unsigned)time(&t));
int arrsize = sizeof(array) / sizeof(int);
for (int i = 0; i < arrsize; i++)
array[i] = rand() % 1000;
printf("n this is processor %d and the unsorted array is:",rank);
printArray(array,arrsize);          
}
MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);
chunk = (int)(arrsize/comm_size);
subarray = (int*)calloc(arrsize,sizeof(int));
if(rank != 0)
{
qsort(subarray,chunk,sizeof(int),comparetor);
}
MPI_Gather( subarray,chunk, MPI_INT,array, arrsize, MPI_INT,0, MPI_COMM_WORLD);
if(rank==0)
{
qsort(array,arrsize,sizeof(int),comparetor);
printf("n this is processor %d and this is sorted array: ",rank);
printArray(array,arrsize);
}
free(subarray);
MPI_Finalize();
return 0;
}

错误说:

Invalid MIT-MAGIC-COOKIE-1 key[h:04865] *** Process received signal ***
[h:04865] Signal: Segmentation fault (11)
[h:04865] Signal code: Address not mapped (1)
[h:04865] Failing at address: 0x421e45
[h:04865] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7f1906b29210]
[h:04865] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18e533)[0x7f1906c71533]
[h:04865] [ 2] /lib/x86_64-linux-gnu/libopen-pal.so.40(+0x4054f)[0x7f190699654f]
[h:04865] [ 3] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_datatype_sndrcv+0x51a)[0x7f1906f3288a]
[h:04865] [ 4] /lib/x86_64-linux-gnu/libmpi.so.40(ompi_coll_base_scatter_intra_basic_linear+0x12c)[0x7f1906f75dec]
[h:04865] [ 5] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Scatter+0x10d)[0x7f1906f5952d]
[h:04865] [ 6] ./parallelQuickSortMPI(+0xc8a5)[0x5640c424b8a5]
[h:04865] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f1906b0a0b3]
[h:04865] [ 8] ./parallelQuickSortMPI(+0xc64e)[0x5640c424b64e]
[h:04865] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node h exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

分段错误的原因在以下行中。

MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);
chunk = (int)(arrsize/comm_size);
subarray = (int*)calloc(arrsize,sizeof(int));

您仅在MPI_Scatter操作后分配subarray并计算chunk大小。这是一个集合操作,必要的内存分配(例如:接收器数组(以及接收大小应在调用之前声明和定义。

chunk = (int)(arrsize/comm_size);
subarray = (int*)calloc(arrsize,sizeof(int));
MPI_Scatter( array,arrsize,MPI_INT, subarray,chunk,MPI_INT,0,MPI_COMM_WORLD);

以上是正确的方法。您将通过此更改移动分段错误。