MPI矩阵乘法使用散点并收集

MPI Matrix Multiplication Using Scatter and Gather

本文关键字：MPI 更新时间：2023-10-16

我正在尝试使用MPI Scatter()和Gather()函数来计算矩阵乘法，我希望能够选择矩阵大小，而无需更改所使用的过程。

>

我已经使用MPI_Scatter和mpi_gather的MPI矩阵乘法的帖子，并使用散点矩阵乘法，但是它们都使用定义较大矩阵大小时不起作用的方法，但仅当矩阵大小时与过程/节点大小相同。

我的代码，示例矩阵大小为8：

#define MAT_SIZE 8
void initialiseMatricies(float a[][MAT_SIZE], float b[][MAT_SIZE], float c[][MAT_SIZE])
{
    int num = 11;
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            a[i][j] = num;
            b[i][j] = num+1;
            c[i][j] = 0;
        }
        num++;
    }
}
int main(int argc, char **argv)
{   
    // MPI Variables
    int rank, size;
    // Create the main matrices with the predefined size
    float matrixA[MAT_SIZE][MAT_SIZE];
    float matrixB[MAT_SIZE][MAT_SIZE];
    float matrixC[MAT_SIZE][MAT_SIZE];
    // Create the separate arrays for storing the scattered rows from the main matrices
    float matrixARows[MAT_SIZE];
    float matrixCRows[MAT_SIZE];
    // Initialise the matrices
    initialiseMatricies(matrixA, matrixB, matrixC);
    // Start the MPI parallel sequence
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    int count = MAT_SIZE * MAT_SIZE / (size * (MAT_SIZE / size));
    // Scatter rows of first matrix to different processes
    MPI_Scatter(matrixA, count, MPI_INT, matrixARows, count, MPI_INT, 0, MPI_COMM_WORLD);
    // Broadcast second matrix to all processes
    MPI_Bcast(matrixB, MAT_SIZE * MAT_SIZE, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Barrier(MPI_COMM_WORLD);
    // Matrix Multiplication
    int sum = 0;
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            sum += matARows[j] * matB[j][i];
        }
        matCRows[i] = sum;
    }
    // Gather the row sums from the buffer and put it in matrix C
    MPI_Gather(matrixCRows, count, MPI_INT, matrixC, count, MPI_INT, 0, MPI_COMM_WORLD);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Finalize();
    // if it's on the master node
    if (rank == 0)
        printResults(matrixA, matrixB, matrixC, calcTime);
    return 0;
}

输出：

1364 2728 4092 5456 6820 8184 9548 10912 
1488 2976 4464 5952 7440 8928 10416 11904 
1612 3224 4836 6448 8060 9672 11284 12896 
1736 3472 5208 6944 8680 10416 12152 13888 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0

输出是正确的，如果我将过程数设置为8(与矩阵大小相同(，则正确计算了整个矩阵，但我不想这样做。我相信我的问题源于Scatter()和Gather()中的计数。如果我将计数设置为：

int count = MAT_SIZE * MAT_SIZE / size;

然后输出变为：

1364 2728 4092 5456 6820 8184 9548 10912 
-1.07374e+08 -1.07374e+08 11 11 11 11 11 11 
1612 3224 4836 6448 8060 9672 11284 12896 
-1.07374e+08 -1.07374e+08 13 13 13 13 13 13 
1860 3720 5580 7440 9300 11160 13020 14880 
-1.07374e+08 -1.07374e+08 15 15 15 15 15 15 
2108 4216 6324 8432 10540 12648 14756 16864 
-1.07374e+08 -1.07374e+08 17 17 17 17 17 17

因为计数本质上是从8(以前(到16，并给我每个过程的调试错误

"运行时检查失败＃2-围绕变量'matrixc'的堆叠已损坏"

我已经改变了几天的时间，但仍然无法弄清楚。我已经尝试更改矩阵乘法开始和结束迭代，但无法通过它来弄清楚。

设置较大的矩阵大小的允许，单独的数组应为2D数组，其中1个维度集作为段的大小作为段的大小，该大小是基于任务/进程的数量：

float matrixARows[MAT_SIZE/size][MAT_SIZE];
float matrixCRows[MAT_SIZE/size][MAT_SIZE];

计数应该是：

int count = MAT_SIZE * MAT_SIZE / size;

和矩阵乘法更改为：

int sum = 0;
for (int k = 0; k < MAT_SIZE/size; k++)
{
    for (int i = 0; i < MAT_SIZE; i++)
    {
        for (int j = 0; j < MAT_SIZE; j++)
        {
            sum += matARows[k][j] * matB[j][i];
        }
        matCRows[k][i] = sum;
        sum = 0;
    }
}

注意：矩阵大小必须由任务/进程的数量排除。例如。如果使用4个任务，则矩阵大小必须为4、8、16、32、64、128等...

没有找到相关文章