段错误,但不是在瓦尔格林德或GDB中

segfault, but not in valgrind or gdb

本文关键字:林德 GDB 错误 段错误      更新时间:2023-10-16

在我的项目中,有一个库,其中包含使用Autodesk的FBX SDK 2017.1加载fbx的代码。

加载 fbx 在调试和发布中崩溃。崩溃以两种不同的方式发生,似乎是随机的:

  • 崩溃要么只是"分段错误"(大多数时候)
  • 崩溃是崩溃中可能涉及的所有库的转储,并且暗示了 Realloc() 调用的问题。(每隔一段时间)从消息的上下文中,我无法确定可能是哪个 realloc(消息后跟所有链接库的转储)。

该代码确实包含 realloc() 调用,特别是在分配 FbxStream 的自定义实现中使用的缓冲区时

大多数代码路径对于Windows是完全相同的,只有一些特定于平台的部分被重新实现。在窗口上,它按预期运行。

让我印象深刻的是,如果我在 gdb 或 valgrind 中运行该程序,崩溃就会消失!所以我开始寻找未初始化的成员/值,但到目前为止我找不到任何可疑的东西。我使用了 CppDepend/CppCheck 和 VS2012 代码分析,但两者都在未初始化的变量/成员上空了

为了提供一些FBX加载的背景知识;FBX SDK有许多方法来处理不同类型的资源(obj,3ds,fbx,..)。可以从文件或流加载它们。要支持大文件,流选项是更相关的选项。下面的代码远非完美,但目前我最感兴趣的是 valgrind/gdb 不会崩溃的原因。我将SDK文档保留在ReadString之上,因为它是最复杂的文档。

class MyFbxStream : public FbxStream{
uint32 m_FormatID;
uint32 m_Error;
EState m_State;
size_t m_Pos;
size_t m_Size;
const Engine::Buffer* const m_Buffer;
MyFbxStream& operator = (const MyFbxStream& other) const;
public:
MyFbxStream(const Engine::Buffer* const buffer) 
: m_FormatID(0)
, m_Error(0)
, m_State(eClosed)
, m_Pos(0)
, m_Size(0)
, m_Buffer(buffer) {};
virtual ~MyFbxStream() {};
virtual bool Open(void* pStreamData) {
m_FormatID = *(uint32*)pStreamData;
m_Pos = 0;
m_State = eOpen;
m_Size = m_Buffer->GetSize();
return true;
}
virtual bool Close() {
m_Pos = m_Size = 0;
m_State = eClosed;
return true;
}
virtual int Read(void* pData, int pSize) const  {
const unsigned char* data = (m_Buffer->GetBase(m_Pos));
const size_t bytesRead = m_Pos + pSize > m_Buffer->GetSize() ? (m_Buffer->GetSize() - m_Pos) : pSize;
const_cast<MyFbxStream*>(this)->m_Pos += bytesRead;
memcpy(pData, data, bytesRead);
return (int)bytesRead;
}
/** Read a string from the stream.
* The default implementation is written in terms of Read() but does not cope with DOS line endings.
* Subclasses may need to override this if DOS line endings are to be supported.
* param pBuffer Pointer to the memory block where the read bytes are stored.
* param pMaxSize Maximum number of bytes to be read from the stream.
* param pStopAtFirstWhiteSpace Stop reading when any whitespace is encountered. Otherwise read to end of line (like fgets()).
* return pBuffer, if successful, else NULL.
* remark The default implementation terminates the e pBuffer with a null character and assumes there is enough room for it.
* For example, a call with e pMaxSize = 1 will fill e pBuffer with the null character only. */
virtual char* ReadString(char* pBuffer, int pMaxSize, bool pStopAtFirstWhiteSpace = false) {
assert(!pStopAtFirstWhiteSpace); // "Not supported"
const size_t pSize = pMaxSize - 1;
if (pSize) {
const char* const base = (const char* const)m_Buffer->GetBase();
char* cBuffer = pBuffer;
const size_t totalSize = std::min(m_Buffer->GetSize(), (m_Pos + pSize));
const char* const maxSize = base + totalSize;
const char* sum = base + m_Pos;
bool done = false;
// first align the copy on alignment boundary (4byte)
while ((((size_t)sum & 0x3) != 0) && (sum < maxSize)) {
const unsigned char c = *sum++;
*cBuffer++ = c;
if ((c == 'n') || (c == 'r')) {
done = true;
break;
}   }
// copy from alignment boundary to boundary (4byte)
if (!done) {
int64 newBytesRead = 0;
uint32* dBuffer = (uint32*)cBuffer;
const uint32* dBase = (uint32*)sum;
const uint32* const dmaxSize = ((uint32*)maxSize) - 1;
while (dBase < dmaxSize) {
const uint32 data = *(const uint32*const)dBase++;
*dBuffer++ = data;
if (((data & 0xff) == 0x0a) || ((data & 0xff) == 0x0d)) { // third bytes, 4 bytes read..
newBytesRead -= 3;
done = true;
break;
} else {
const uint32 shiftedData8 = data & 0xff00;
if ((shiftedData8 == 0x0a00) || (shiftedData8 == 0x0d00)) { // third bytes, 3 bytes read..
newBytesRead -= 2;
done = true;
break;
} else {
const uint32 shiftedData16 = data & 0xff0000;
if ((shiftedData16 == 0x0a0000) || (shiftedData16 == 0x0d0000)) { // second byte, 2 bytes read..
newBytesRead -= 1;
done = true;
break;
} else {
const uint32 shiftedData24 = data & 0xff000000;
if ((shiftedData24 == 0x0a000000) || (shiftedData24 == 0x0d000000)) { // first byte, 1 bytes read..
done = true;
break;
}   }   }   }   }
newBytesRead += (int64)dBuffer - (int64)cBuffer;
if (newBytesRead) {
sum += newBytesRead;
cBuffer += newBytesRead;
}   }
// copy anything beyond the last alignment boundary (4byte)
if (!done) {
while (sum < maxSize) {                 
const unsigned char c = *sum++;
*cBuffer++ = c;
if ((c == 'n') || (c == 'r')) {
done = true;
break;
}   }   }
const size_t bytesRead = cBuffer - pBuffer;
if (bytesRead) {
const_cast<MyFbxStream*>(this)->m_Pos += bytesRead;
pBuffer[bytesRead] = 0;
return pBuffer;
}   }       
pBuffer = NULL;
return NULL;
}
virtual void Seek(const FbxInt64& pOffset, const FbxFile::ESeekPos& pSeekPos) {
switch (pSeekPos) {
case FbxFile::ESeekPos::eBegin:     m_Pos = pOffset; break;
case FbxFile::ESeekPos::eCurrent:   m_Pos += pOffset; break;
case FbxFile::ESeekPos::eEnd:       m_Pos = m_Size - pOffset; break;
}
}
virtual long GetPosition() const        {   return (long)m_Pos; }
virtual void SetPosition(long position) {   m_Pos = position;   }
virtual void ClearError()               {   m_Error = 0;    }
virtual int GetError() const            {   return m_Error; }
virtual EState GetState()               {   return m_State; }
virtual int GetReaderID() const         {   return m_FormatID;  }
virtual int GetWriterID() const         {   return -1;  }                       // readonly stream
virtual bool Flush()                    {   return true;    }                   // readonly stream
virtual int Write(const void* /*d*/, int /*s*/) {   assert(false);  return 0; } // readonly stream
};

我假设可能存在与 malloc/free/realloc 操作相关的未定义行为,这些行为不知何故不会在 gdb 中发生。但如果是这种情况,我也希望Windows二进制文件出现问题。

另外,我不知道这是否相关,但是当我跟踪Open()函数并打印"m_Buffer"指针的值(或"this")时,我得到一个以0xfffffff..开头的指针值,这对于Windows程序员来说看起来像是一个问题。但是,我可以在 linux 中得出同样的结论吗,因为我在静态函数调用等中也看到了这种情况。

如果我在 gdb 或 valgrind 中运行该程序,崩溃就会消失!

有两种可能的解释:

  1. 有多个线程,代码表现出数据竞争,GDB 和 Valgrind 都会显著影响执行时间。
  2. GDB 禁用地址随机化;Valgrind 显著影响程序布局,并且崩溃对确切布局很敏感。

我将采取的步骤:

  1. 设置ulimit -c unlimited,运行程序并使其转储core,然后在GDB中使用事后分析。
  2. 在 GDB 下运行程序,使用set disable-randomization off看看是否可以以这种方式到达崩溃点。
  3. 使用Helgrind或DRD(Valgrind的线程错误检测器)运行程序。