在 Boost.Test 中异常时双倍释放

Double free on exception in Boost.Test

本文关键字:释放 异常 Boost Test      更新时间:2023-10-16

我在集群上执行 Boost.Test 测试用例时遇到了问题。错误是:*** glibc detected *** ...myprogram.test: corrupted double-linked list: 0x000000000096b4d0 ***

运行瓦尔格林德给了我:

==9687== Invalid free() / delete / delete[] / realloc()
==9687==    at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==9687==    by 0x3A81035D2C: __cxa_finalize (in /lib64/libc-2.12.so) 
==9687==    by 0x721CD05: ??? (in /lib/libboost_unit_test_framework-gcc71-mt-d-1_65_1.so.1.65.1)
==9687==    by 0x72ABF9C: ??? (in /lib/libboost_unit_test_framework-gcc71-mt-d-1_65_1.so.1.65.1)
==9687==    by 0x3A81035991: exit (in /lib64/libc-2.12.so)
==9687==    by 0x3A8101ED23: (below main) (in /lib64/libc-2.12.so)   
==9687==  Address 0x9919d80 is 0 bytes inside a block of size 18 free'd
==9687==    at 0x4A06016: operator delete(void*) (vg_replace_malloc.c:480)
==9687==    by 0x3A81035991: exit (in /lib64/libc-2.12.so)
==9687==    by 0x3A8101ED23: (below main) (in /lib64/libc-2.12.so)   

来自 GDB 的堆栈跟踪如下所示:

#0  0x0000003a81032495 in raise () from /lib64/libc.so.6
#1  0x0000003a81033c75 in abort () from /lib64/libc.so.6
#2  0x0000003a810703a7 in __libc_message () from /lib64/libc.so.6
#3  0x0000003a81075dee in malloc_printerr () from /lib64/libc.so.6
#4  0x0000003a810761f3 in malloc_consolidate () from /lib64/libc.so.6
#5  0x0000003a81078c18 in _int_free () from /lib64/libc.so.6
#6  0x00000000005feae8 in boost::checked_array_delete<char(x=0x991a20 "21035070201:") at /include/boost-1_65_1/boost/core/checked_delete.hpp:41
#7  0x00000000005fbd21 in boost::scoped_array<char>::~scoped_array (this=0x94bd80, __in_chrg=<optimized out>) at /include/boost-1_65_1/boost/smart_ptr/scoped_array.hpp:69
#8  0x00000000005f9d36 in boost::execution_monitor::~execution_monitor (this=0x94bd60, __in_chrg=<optimized out>)
at /include/boost-1_65_1/boost/test/execution_monitor.hpp:316
#9  0x00000000005fbd3c in boost::unit_test::unit_test_monitor_t::~unit_test_monitor_t (this=0x94bd60, __in_chrg=<optimized out>)
at /include/boost-1_65_1/boost/test/unit_test_monitor.hpp:33
#10 0x0000003a81035992 in exit () from /lib64/libc.so.6
#11 0x0000003a8101ed24 in __libc_start_main () from /lib64/libc.so.6
#12 0x00000000005f5b59 in _start ()

当抛出任何未捕获的异常(包括测试失败)以及在某些(当前未知)情况下,会发生这种情况。但是异常时崩溃是 100% 可重现的。

该程序看起来很好,因为在本地它可以工作,没有任何此类崩溃。所以我假设这是由于集群上某些模块之间的不兼容。

为了避免这种情况,我重新编译了 Boost 和 OpenBLAS,但我仍在使用其他几个库,我不想重建它们(需要花费大量时间)只是为了测试它们中的每一个。这些是libSSH2,GPI2,HDF5,尽管它们没有出现在ldd中,所以我假设静态链接(我不是测试的作者),并认为它们不太可能引起问题:

linux-vdso.so.1 =
libpthread.so.0 =/lib64/libpthread.so.0
librt.so.1 =/lib64/librt.so.1
libboost_filesystem-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_filesystem-gcc71-mt-d-1_65_1.so.1.65.1
libboost_program_options-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_program_options-gcc71-mt-d-1_65_1.so.1.65.1
libboost_coroutine-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_coroutine-gcc71-mt-d-1_65_1.so.1.65.1
libboost_context-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_context-gcc71-mt-d-1_65_1.so.1.65.1
libboost_iostreams-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_iostreams-gcc71-mt-d-1_65_1.so.1.65.1
libboost_regex-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_regex-gcc71-mt-d-1_65_1.so.1.65.1
libboost_thread-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_thread-gcc71-mt-d-1_65_1.so.1.65.1
libboost_date_time-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_date_time-gcc71-mt-d-1_65_1.so.1.65.1
libboost_chrono-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_chrono-gcc71-mt-d-1_65_1.so.1.65.1
libboost_atomic-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_atomic-gcc71-mt-d-1_65_1.so.1.65.1
libboost_system-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_system-gcc71-mt-d-1_65_1.so.1.65.1
libboost_serialization-gcc71-mt-d-1_65_1.so.1.65.1 =/lib/libboost_serialization-gcc71-mt-d-1_65_1.so.1.65.1
libdl.so.2 =/lib64/libdl.so.2
libssl.so.10 =/usr/lib64/libssl.so.10
libgssapi_krb5.so.2 =/lib64/libgssapi_krb5.so.2
libkrb5.so.3 =/lib64/libkrb5.so.3
libcom_err.so.2 =/lib64/libcom_err.so.2
libk5crypto.so.3 =/lib64/libk5crypto.so.3
libresolv.so.2 =/lib64/libresolv.so.2
libcrypto.so.10 =/usr/lib64/libcrypto.so.10
libz.so.1 =/lib64/libz.so.1
libstdc++.so.6 =/sw/global/compilers/gcc/7.1.0/lib64/libstdc++.so.6
libm.so.6 =/lib64/libm.so.6
libgcc_s.so.1 =/sw/global/compilers/gcc/7.1.0/lib64/libgcc_s.so.1
libc.so.6 =/lib64/libc.so.6
/lib64/ld-linux-x86-64.so.2
libbz2.so.1 =/lib64/libbz2.so.1
liblzma.so.0 =/usr/lib64/liblzma.so.0
libicudata.so.42 =/usr/lib64/libicudata.so.42
libicui18n.so.42 =/usr/lib64/libicui18n.so.42
libicuuc.so.42 =/usr/lib64/libicuuc.so.42
libkrb5support.so.0 =/lib64/libkrb5support.so.0
libkeyutils.so.1 =/lib64/libkeyutils.so.1
libselinux.so.1 =/lib64/libselinux.so.1

根据我的发现,我认为第二个免费是"正确"的,因为它是释放内存的智能指针。所以第一次删除是错误的,但它来自内部exit这对我没有帮助。

我如何找到,为什么以及如何找到该指针是双重免费的?请注意,我在集群上没有根,因此 GCC 库的调试符号不可用。

使用的编译器是GCC 7.1和Boost 1.65.1,尽管我已经尝试了其他Boost版本和GCC 5.3

我将一个测试用例简化为:

  • 针对图书馆的链接
  • BOOST_AUTO_TEST_CASE(...)
  • std::runtime_error

所以问题出在库的静态初始化/定型中的某个地方。

您是否正在使用数据集(数据驱动的测试用例)?

如果是这样,您可能会遇到 https://svn.boost.org/trac10/ticket/13380

我之前在这里遇到并分析过这个问题:Boost 的数据驱动测试的联接运算符"+"损坏了第一列