我如何理解我的Valgrind错误消息
how can I understand my valgrind error message?
我从valgrind收到以下错误消息:
==1808== 0 bytes in 1 blocks are still reachable in loss record 1 of 1,734
==1808== at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==1808== by 0x4CC2BA9: hwloc_build_level_from_list (topology.c:1603)
==1808== by 0x4CC2BA9: hwloc_connect_levels (topology.c:1774)
==1808== by 0x4CC2F25: hwloc_discover (topology.c:2091)
==1808== by 0x4CC2F25: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==1808== by 0x4C60957: orte_odls_base_open (odls_base_open.c:205)
==1808== by 0x632FDB3: ???
==1808== by 0x4C3B6B9: orte_init (orte_init.c:127)
==1808== by 0x403E0E: orterun (orterun.c:693)
==1808== by 0x4035E3: main (main.c:13)
==1808==
==1808== 0 bytes in 1 blocks are still reachable in loss record 2 of 1,734
==1808== at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==1808== by 0x4CC2BD5: hwloc_build_level_from_list (topology.c:1603)
==1808== by 0x4CC2BD5: hwloc_connect_levels (topology.c:1775)
==1808== by 0x4CC2F25: hwloc_discover (topology.c:2091)
==1808== by 0x4CC2F25: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==1808== by 0x4C60957: orte_odls_base_open (odls_base_open.c:205)
==1808== by 0x632FDB3: ???
==1808== by 0x4C3B6B9: orte_init (orte_init.c:127)
==1808== by 0x403E0E: orterun (orterun.c:693)
==1808== by 0x4035E3: main (main.c:13)
我无法理解Valgrind正在报告哪种问题。有人愿意解释吗?
我已经检查了所有新实例。它们都被正确删除了。
我正在获取valgrind错误messagges和代码结束时MPI的其他错误形式:
---------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1811 on node laki.pi.ingv.it exited on signal 11 (Segmentation fault).
----------------------------------------------------------------------
这是有关mpi_init的错误消息:
==31198== 0 bytes in 1 blocks are still reachable in loss record 1 of 368
==31198== at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==31198== by 0xC66DE49: hwloc_build_level_from_list (topology.c:1603)
==31198== by 0xC66DE49: hwloc_connect_levels (topology.c:1774)
==31198== by 0xC66E1C5: hwloc_discover (topology.c:2091)
==31198== by 0xC66E1C5: opal_hwloc132_hwloc_topology_load (topology.c:2596)
==31198== by 0xC62B473: opal_hwloc_unpack (hwloc_base_dt.c:83)
==31198== by 0xC6270AB: opal_dss_unpack_buffer (dss_unpack.c:120)
==31198== by 0xC62815F: opal_dss_unpack (dss_unpack.c:84)
==31198== by 0xC5F2349: orte_util_nidmap_init (nidmap.c:146)
==31198== by 0xED98608: ???
==31198== by 0xC5DC0B9: orte_init (orte_init.c:127)
==31198== by 0xC59DBAE: ompi_mpi_init (ompi_mpi_init.c:357)
==31198== by 0xC5B443F: PMPI_Init (pinit.c:84)
==31198== by 0x55FA53: main (solver_2d.hpp:22)
line solver_2d.hpp:22完全组成:
MPI_Init(&argc, &argv);
此外,与mpi_finalize((相关的错误消息;是
==31198== 1 errors in context 1 of 58:
==31198== Syscall param write(buf) points to uninitialised byte(s)
==31198== at 0x38EF00E6FD: ??? (in /lib64/libpthread-2.12.so)
==31198== by 0x11F1F548: ???
==31198== by 0x11F1E03F: ???
==31198== by 0x11CD7FBA: ???
==31198== by 0x11CE519A: ???
==31198== by 0x11CE3C37: ???
==31198== by 0x11CD90C1: ???
==31198== by 0x11AC2E36: ???
==31198== by 0xC59ECC4: ompi_mpi_finalize (ompi_mpi_finalize.c:285)
==31198== by 0x562185: main (solver_2d.hpp:171)
==31198== Address 0x1ffeffda24 is on thread 1's stack
==31198== Uninitialised value was created by a stack allocation
==31198== at 0x11CCE050: ???
和
==31197== Syscall param write(buf) points to uninitialised byte(s)
==31197== at 0x38EF00E6FD: ??? (in /lib64/libpthread-2.12.so)
==31197== by 0x11F1F548: ipath_cmd_write (in /usr/lib64/libinfinipath.so.4.0)
==31197== by 0x11F1E03F: ipath_poll_type (in /usr/lib64/libinfinipath.so.4.0)
==31197== by 0x11CD7FBA: psmi_context_interrupt_set (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197== by 0x11CE519A: ips_ptl_rcvthread_fini (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197== by 0x11CE3C37: ??? (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197== by 0x11CD90C1: psm_ep_close (in /usr/lib64/libpsm_infinipath.so.1.15)
==31197== by 0x11AC2E36: ompi_mtl_psm_finalize (mtl_psm.c:200)
==31197== by 0xC59ECC4: ompi_mpi_finalize (ompi_mpi_finalize.c:285)
==31197== by 0x562185: main (solver_2d.hpp:171)
==31197== Address 0x1ffeffda24 is on thread 1's stack
==31197== in frame #2, created by ipath_poll_type (???:)
==31197== Uninitialised value was created by a stack allocation
==31197== at 0x11CCE050: ??? (in /usr/lib64/libpsm_infinipath.so.1.15)
其中line solver_2d.hpp:171对应于:
MPI_Finalize();
最后,与MPI_WRITE相对应的错误消息,或者更好地读取MPI_FILE_OPEN读取:
==31198== 48 bytes in 1 blocks are still reachable in loss record 104 of 368
==31198== at 0x4A05E7D: malloc (vg_replace_malloc.c:309)
==31198== by 0xC58C750: opal_obj_new (opal_object.h:469)
==31198== by 0xC58C750: ompi_attr_set_c (attribute.c:761)
==31198== by 0xC5AA0BE: PMPI_Attr_put (pattr_put.c:58)
==31198== by 0x118501AB: ???
==31198== by 0x11843159: ???
==31198== by 0x1185657D: ???
==31198== by 0xC5CEFB5: module_init (io_base_file_select.c:442)
==31198== by 0xC5CEFB5: mca_io_base_file_select (io_base_file_select.c:214)
==31198== by 0xC5977A5: ompi_file_open (file.c:128)
==31198== by 0xC5C6557: PMPI_File_open (pfile_open.c:96)
==31198== by 0x5638A1: p_fstream (p_fstream.hpp:86)
line p_fstream.hpp:86是:
MPI_File_open(MPI_COMM_WORLD, const_cast<char*>(fname.c_str()), flags, MPI_INFO_NULL, &mpi_file);
valgrind
消息报告mpirun
中的内存泄漏,您可能不在乎。
我假设你跑了
valgrind mpirun a.out
但是您确实想在MPI应用程序本身中查找不正确的内存访问/泄漏。在这种情况下,您应该运行
mpirun valgrind a.out
注意所有输出将交错,并且由于您使用的是开放MPI,您可以
mpirun --tag-output valgrind a.out
让每个任务的输出都带有其等级值。
相关文章:
- C++错误消息*成员参考.**初学者*
- 如何通过参数抛出错误消息
- glad 导致 glfwSwapBuffers 返回错误消息
- FindPackageHandleStandardArgs.cmake:137 的 CMake 错误(消息):找不到 Boost (缺少:正则表达式)(找到合适的版本"1.72.0",
- 如何接受 [ENTER] 键作为无效输入并发送错误消息
- 重新定义预定义的 errno 错误消息 (E2BIG)
- SDL 映像:无法打开映像,仅显示错误消息
- 错误消息:使用"string* +="后"no match for 'operator+='"
- 错误消息"expected expression"....有人知道它为什么这么说吗?
- 在 Eclipse: "error: no match for 'operator='" 中获取错误消息
- GCC 警告和 gcc 错误消息之间的区别
- 为 c++ cin 上的输入数无效生成错误消息
- 来自带有mbedtls TLS的Mongoose Web服务器的错误消息
- 如何使用realsense摄像头调试ROS错误消息
- C++ 概念 - 需要括号中的概念会导致 2 条冲突的错误消息
- 编译器收到错误消息(textc.exe已停止)
- 结构数组的构造函数错误,错误消息:没有构造函数实例与参数列表匹配
- 使用 c++/WinRT Direct3D UWP Game DR 模板 Direct X11 的错误消息
- max_element() 给出非常奇怪的错误消息
- 不断收到错误消息,并非所有控制路径都返回值