在Slurm和使用命令的情况下,MPI结果不同
MPI result is different under Slurm and by using command
我在Slurm运行MPI项目时遇到了一个问题。
a1是我的可执行文件。当我刚运行mpiexec -np 4 ./a1
时,它工作得很好
但当我在Slurm下运行它时,它不能很好地工作,看起来它停在了中间:
这是使用mpiexec -np 4 ./a1
的输出,这是正确的。
Processor1 will send and receive with processor0
Processor3 will send and receive with processor0
Processor0 will send and receive with processor1
Processor0 finished send and receive with processor1
Processor1 finished send and receive with processor0
Processor2 will send and receive with processor0
Processor1 will send and receive with processor2
Processor2 finished send and receive with processor0
Processor0 will send and receive with processor2
Processor0 finished send and receive with processor2
Processor0 will send and receive with processor3
Processor0 finished send and receive with processor3
Processor3 finished send and receive with processor0
Processor1 finished send and receive with processor2
Processor2 will send and receive with processor1
Processor2 finished send and receive with processor1
Processor0: I am very good, I save the hash in range 0 to 65
p: 4
Tp: 8.61754
Processor1 will send and receive with processor3
Processor3 will send and receive with processor1
Processor3 finished send and receive with processor1
Processor1 finished send and receive with processor3
Processor2 will send and receive with processor3
Processor1: I am very good, I save the hash in range 65 to 130
Processor2 finished send and receive with processor3
Processor3 will send and receive with processor2
Processor3 finished send and receive with processor2
Processor3: I am very good, I save the hash in range 195 to 260
Processor2: I am very good, I save the hash in range 130 to 195
这是Slurm下的输出,它不会像使用命令那样返回整个结果。
Processor0 will send and receive with processor1
Processor2 will send and receive with processor0
Processor3 will send and receive with processor0
Processor1 will send and receive with processor0
Processor0 finished send and receive with processor1
Processor1 finished send and receive with processor0
Processor0 will send and receive with processor2
Processor0 finished send and receive with processor2
Processor2 finished send and receive with processor0
Processor1 will send and receive with processor2
Processor0 will send and receive with processor3
Processor2 will send and receive with processor1
Processor2 finished send and receive with processor1
Processor2 will send and receive with processor3
Processor1 finished send and receive with processor2
这是我的Slurm.sh文件:我想我在其中犯了一些错误,结果与命令不同,但我不确定。。。
#!/bin/bash
####### select partition (check CCR documentation)
#SBATCH --partition=general-compute --qos=general-compute
####### set memory that nodes provide (check CCR documentation, e.g., 32GB)
#SBATCH --mem=64000
####### make sure no other jobs are assigned to your nodes
#SBATCH --exclusive
####### further customizations
#SBATCH --job-name="a1"
#SBATCH --output=%j.stdout
#SBATCH --error=%j.stderr
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=12:00:00
mpiexec -np 4 ./a1
再次回来解决我的问题。我犯了一个愚蠢的错误,我在mpi代码中使用了一个错误的slum.sh。正确的slum.sh是:
#!/bin/bash
####### select partition (check CCR documentation)
#SBATCH --partition=general-compute --qos=general-compute
####### set memory that nodes provide (check CCR documentation, e.g., 32GB)
#SBATCH --mem=32000
####### make sure no other jobs are assigned to your nodes
#SBATCH --exclusive
####### further customizations
#SBATCH --job-name="a1"
#SBATCH --output=%j.stdout
#SBATCH --error=%j.stderr
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=12
#SBATCH --time=01:00:00
####### check modules to see which version of MPI is available
####### and use appropriate module if needed
module load intel-mpi/2018.3
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
srun /.a1
我太傻了,这就是为什么我用科南作为昵称。。。我希望我能变得聪明。
相关文章:
- 为什么"do while"循环不断退出,即使条件计算结果为 false?
- valgrind-hellgrind与泄漏检查的结果不同
- 用C++20 fmt限制结果的总大小
- 如何返回一个类的两个对象相加的结果
- 用MacOS Mojave编译C++:致命错误:mpi.h:没有这样的文件或目录
- 使用QProcess执行命令,并将结果存储在QStringList中
- 如果我std::dynamic_pointer_cast并且底层dynamic_cast的结果为null,那么返回的sh
- 在没有定义返回类型的函数中返回布尔值,并将结果保存在无错误的char编译中-为什么
- 序列化,没有库的整数,得到奇怪的结果
- 使用取消引用的指针的多态性会产生意外的结果.为什么?
- 在更改for循环的第三部分后,未使用for循环结果
- MPI突然停止了对多个核心的操作
- 使用++运算符会导致意外的结果
- 为什么在逗号分隔符上下文中将预增量的结果强制转换为void
- C++Brute Force攻击函数不会返回结果
- 你好。。。id_public变量不应该给出结果为 81 和 86 吗?为什么它为两个派生类占用不同的内存位置?
- 在Slurm和使用命令的情况下,MPI结果不同
- 使用MPI临时接收数据,然后返回结果
- 将处理器结果保存到MPI中的一个数组中
- 将MPI结果写入文件