找出堆内存损坏的位置

Find out where heap memory gets corrupted

本文关键字：位置损坏内存更新时间：2023-10-16

我知道已经有很多类似的问题和答案了，但我无法解决我的问题。

在我的大型应用程序中，堆在某个地方被损坏，我找不到它。我也使用了类似gflags的工具，但运气不好。

我在以下样本上尝试了gflags，它故意破坏了堆：

char* pBuffer = new char[256];
memset(pBuffer, 0, 256 + 1);
delete[] pBuffer;

在第2行，堆被覆盖，但如何通过gflags、windbg等工具找到它。可能是我没有正确使用gflags。

如果自动化工具（如电动围栏或valgrind）不能做到这一点，并且专注地盯着你的代码试图找出它可能出错的地方也无济于事，而禁用/启用各种操作（直到你得到堆损坏的存在与预先执行或未执行的操作之间的相关性）来缩小它似乎也不起作用，你总是可以尝试这种技术，它试图尽早发现损坏，以便更容易地追踪来源：

创建您自己的自定义新操作符和删除操作符，在分配的内存区域周围放置明显损坏的保护区域，类似于以下内容：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <new>
// make this however big you feel is "big enough" so that corrupted bytes will be seen in the guard bands
static int GUARD_BAND_SIZE_BYTES = 64;
static void * MyCustomAlloc(size_t userNumBytes)
{
    // We'll allocate space for a guard-band, then space to store the user's allocation-size-value,
    // then space for the user's actual data bytes, then finally space for a second guard-band at the end.
    char * buf = (char *) malloc(GUARD_BAND_SIZE_BYTES+sizeof(userNumBytes)+userNumBytes+GUARD_BAND_SIZE_BYTES);
    if (buf)
    {
       char * w = buf;
       memset(w, 'B', GUARD_BAND_SIZE_BYTES);          w += GUARD_BAND_SIZE_BYTES;
       memcpy(w, &userNumBytes, sizeof(userNumBytes)); w += sizeof(userNumBytes);
       char * userRetVal = w;                          w += userNumBytes;
       memset(w, 'E', GUARD_BAND_SIZE_BYTES);          w += GUARD_BAND_SIZE_BYTES;
       return userRetVal;
    }
    else throw std::bad_alloc();
}
static void MyCustomDelete(void * p)
{
    if (p == NULL) return;   // since delete NULL is a safe no-op
    // Convert the user's pointer back to a pointer to the top of our header bytes
    char * internalCP = ((char *) p)-(GUARD_BAND_SIZE_BYTES+sizeof(size_t));
    char * cp = internalCP;
    for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
    {
        if (*cp++ != 'B')
        {
            printf("CORRUPTION DETECTED at BEGIN GUARD BAND POSITION %i of allocation %pn", i, p);
            abort();
        }
    }
    // At this point, (cp) should be pointing to the stored (userNumBytes) field
    size_t userNumBytes = *((const size_t *)cp);
    cp += sizeof(userNumBytes);  // skip past the user's data
    cp += userNumBytes;
    // At this point, (cp) should be pointing to the second guard band
    for (int i=0; i<GUARD_BAND_SIZE_BYTES; i++)
    {
        if (*cp++ != 'E')
        {
            printf("CORRUPTION DETECTED at END GUARD BAND POSITION %i of allocation %pn", i, p);
            abort();
        }
    }
    // If we got here, no corruption was detected, so free the memory and carry on
    free(internalCP);
}
// override the global C++ new/delete operators to call our
// instrumented functions rather than their normal behavior
void * operator new(size_t s)    throw(std::bad_alloc)   {return MyCustomAlloc(s);}
void * operator new[](size_t s)  throw(std::bad_alloc)   {return MyCustomAlloc(s);}
void operator delete(void * p)   throw()                 {MyCustomDelete(p);}
void operator delete[](void * p) throw()                 {MyCustomDelete(p);}

以上内容足以让您获得Electric Fence风格的功能，因为如果在任何新的/删除内存分配的开始或结束时，有任何东西写入两个64字节的"保护带"中的任何一个，那么当分配被删除时，MyCustomDelete（）将注意到损坏并使程序崩溃。

如果这还不够好（例如，因为在删除发生时，自损坏以来已经发生了太多事情，很难判断是什么导致了损坏），您可以更进一步，让MyCustomAlloc（）将分配的缓冲区添加到一个单例/全局双链接的分配列表中，并让MyCustomDelete（）将其从同一列表中删除（如果您的程序是多线程的，请确保序列化这些操作！）。这样做的好处是，您可以添加另一个名为CheckForHeapCoruption（）的函数，该函数将对该链表进行迭代，检查链表中每个分配的保护带，并报告其中是否有损坏。然后，你可以在整个代码中添加对CheckForHeapCoruption（）的调用，这样当堆损坏发生时，它将在下一次对CheckForheapCorupton（）的访问时被检测到，而不是在一段时间后。最终，你会发现对CheckForHipCorupon（）的一次调用传递得很快，然后在几行后对CheckForHepCorupion（）的下一次调用中，检测到损坏，此时您知道损坏是由对CheckForHeapCoruption（）的两次调用之间执行的任何代码引起的，然后您可以研究该特定代码以找出它做错了什么，和/或根据需要在该代码中添加更多对CheckForheapCorupton（）的调用。

重复此步骤，直到错误变得明显为止。祝你好运

如果同一个变量一直被破坏，数据断点是一种快速而简单的方法，可以找到导致更改的代码（如果IDE支持它们）。（在MS Visual Studio 2008中调试->新建断点->新建数据断点…）。如果你的堆损坏更随机，它们不会有帮助（但我想我会分享一个简单的答案，以防有帮助）。

我认为Windows上也支持一个名为electric fence的工具。

本质上，它所做的是劫持malloc和co，使每个分配都在页面边界结束，并标记下一个页面不可访问。

其效果是在缓冲区溢出时出现seg错误。

它可能还有一个缓冲区不足的选项。

请阅读此链接Visual Studio-如何查找堆损坏错误的来源

有没有一个好的Valgrind替代Windows？

它告诉了在windows上查找堆问题的技术。

但另一方面，您总是可以编写（如果您正在编写新代码）内存管理器。方法是：使用包装器api，它将调用malloc/caloc等

假设您有api myMalloc（size_t-len）；然后在函数中，您可以尝试分配HEADER+len+FOOTER。在你的标题上保存信息，比如分配的大小，或者可能是更多的信息。在页脚添加一些神奇的数字，比如deadbeef。并从myMalloc返回ptr（来自malloc）+HEADER。

当使用myfree（void*ptr）释放它时，只需执行ptr-HEADER，检查len，然后跳到FOOTER=ptr HEADER+真正所有的len。在这个偏移处，你应该找到死牛肉，如果你没有找到，那么你知道，它已经被破坏了。