Valgrind堆栈完全错过了一个功能

Valgrind stack misses a function completely

本文关键字：一个功能堆栈错过了 Valgrind 更新时间：2023-10-16

我有两个c文件:

交流

void main(){
    ...
    getvtable()->function();
}

虚值表指向位于b.c中的函数:

void function(){
    malloc(42);
}

现在，如果我在valgrind中跟踪程序，我会得到以下结果:

==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994==    at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994==    by 0x40A24D2: (below main) (libc-start.c:226)

所以对函数的调用完全在堆栈上提交!这怎么可能?如果我使用GDB，则显示包含"function"的正确堆栈。

包含调试符号，Linux, 32位。

乌利希期刊指南:

回答第一个问题后，在调试valgrind的GDB服务器时得到以下输出。当我直接使用GDB进行调试时，断点没有出现。

stasik@gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
 eip = 0x404efcb; saved eip 0x4f2f544c
 called by frame at 0x42348a4
 Arglist at 0x4234898, args:
 Locals at 0x4234898, Previous frame's sp is 0x42348a0
 Saved registers:
  ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
 eip = 0x4f2f544c; saved eip 0x6e492056
 called by frame at 0x42348a8, caller of frame at 0x42348a0
 Arglist at 0x423489c, args:
 Locals at 0x423489c, Previous frame's sp is 0x42348a4
 Saved registers:
  eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
 eip = 0x6e492056; saved eip 0x205d6f66
 called by frame at 0x42348ac, caller of frame at 0x42348a4
 Arglist at 0x42348a0, args:
 Locals at 0x42348a0, Previous frame's sp is 0x42348a8
 Saved registers:
  eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
 eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
 called by frame at 0x42348b0, caller of frame at 0x42348a8
 Arglist at 0x42348a4, args:
 Locals at 0x42348a4, Previous frame's sp is 0x42348ac
 Saved registers:
  eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
 eip = 0x61746144; saved eip 0x65736162
 called by frame at 0x42348b4, caller of frame at 0x42348ac
 Arglist at 0x42348a8, args:
 Locals at 0x42348a8, Previous frame's sp is 0x42348b0
 Saved registers:
  eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
 eip = 0x65736162; saved eip 0x70616d20
 called by frame at 0x42348b8, caller of frame at 0x42348b0
 Arglist at 0x42348ac, args:
 Locals at 0x42348ac, Previous frame's sp is 0x42348b4
 Saved registers:
  eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
 eip = 0x70616d20; saved eip 0x2e646570
 called by frame at 0x42348bc, caller of frame at 0x42348b4
 Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
 Locals at 0x42348b0, Previous frame's sp is 0x42348b8
 Saved registers:
  eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
 eip = 0x2e646570; saved eip 0x0
 called by frame at 0x42348c0, caller of frame at 0x42348b8
 Arglist at 0x42348b4, args:
 Locals at 0x42348b4, Previous frame's sp is 0x42348bc
 Saved registers:
  eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
 eip = 0x0; saved eip 0x0
 caller of frame at 0x42348bc
 Arglist at 0x42348b8, args:
 Locals at 0x42348b8, Previous frame's sp is 0x42348c0
 Saved registers:
  eip at 0x42348bc
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.

我认为有两个可能的原因:

Valgrind使用与GDB不同的堆栈展开方法
地址空间布局是不同的，而在两个环境下运行你的程序，你只击中堆栈损坏在Valgrind

我们可以通过使用Valgrind的内置gdbserver来获得更多的信息。

保存Python代码片段到thread-frames.py

import gdb
f = gdb.newest_frame()
while f is not None:
    f.select()
    gdb.execute('info frame')
    f = f.older()

t.gdb

set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit

v.gdb

set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit

(根据需要更改上述脚本中的MY-PROGRAM、function)

获取GDB下栈帧的详细信息:

$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbffff310
 source language c.
 Arglist at 0xbffff2e8, args: 
 Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
 Saved registers:
  ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
 caller of frame at 0xbffff2f0
 source language c.
 Arglist at 0xbffff2f8, args: 
 Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
 Saved registers:
  ebp at 0xbffff2f8, eip at 0xbffff30c

在Valgrind下获取相同的数据:

$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM

在另一个shell中:

$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
 eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
 called by frame at 0xbe88e2e0
 source language c.
 Arglist at 0xbe88e2b8, args: 
 Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
 Saved registers:
  ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
 eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
 caller of frame at 0xbe88e2c0
 source language c.
 Arglist at 0xbe88e2c8, args: 
 Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
 Saved registers:
  ebp at 0xbe88e2c8, eip at 0xbe88e2dc

如果GDB在连接到"valgrind——GDB "时可以成功地展开堆栈，那么这是valgrind的堆栈展开算法的问题。您可以仔细检查"信息帧"输出，以查找内联和尾部调用帧或其他可能导致Valgrind失败的原因。否则可能是堆栈损坏

好吧，编译所有。so部分和带有显式- 0的主程序似乎解决了问题。似乎是加载。so (so总是未优化编译)的"核心"程序的一些优化破坏了堆栈。

这就是尾部调用优化。

函数function调用malloc作为它做的最后一件事。编译器看到这一点，并在调用malloc之前杀死function 的堆栈帧。这样做的好处是，当malloc返回时，它直接返回到调用function的函数。也就是说，它避免了malloc返回到function只是为了再点击一条返回指令。

在这种情况下，优化防止了不必要的跳转，使堆栈使用稍微更有效，这很好，但在递归尾部调用的情况下，这种优化是一个巨大的胜利，因为它把递归变成了更像迭代的东西。

正如您已经发现的那样，禁用优化使调试更加容易。如果你想调试优化的代码(也许是为了性能测试)，那么，正如@臧明杰已经说过的，你可以用-fno-optimize-sibling-calls禁用这个优化。