在 mmap'ed 区域上使用 memcpy 崩溃，for 循环不会

Using memcpy on mmap'ed region crashes, a for loop does not

本文关键字：崩溃 memcpy for 循环 mmap ed 区域更新时间：2023-10-16

我在载板上有一个 NVIDIA Tegra TK1 处理器模块，上面有一个 PCI-e 插槽。在该 PCIe 插槽中是一个 FPGA 板，它通过 PCIe 公开一些寄存器和 64K 内存区域。

在Tegra主板的ARM CPU上，正在运行最小的Linux安装。

我正在使用/dev/mem 和 mmap 函数来获取指向寄存器结构和 64K 内存区域的用户空间指针。不同的寄存器文件和内存块都是分配的地址，这些地址对齐且与4KB内存页不重叠。我使用 mmap 显式映射整个页面，使用 getpagesize(( 的结果，这也是 4096。

我可以很好地读取/写入那些暴露的寄存器。我可以从内存区域(64KB(读取，在for循环中进行uint32逐字读取，很好。即读取内容是正确的。

但是，如果我在同一地址范围内使用 std：：memcpy，Tegra CPU 总是会冻结。我没有看到任何错误消息，如果附加了 GDB，我在尝试跨过 memcpy 线时也没有在 Eclipse 中看到任何东西，它只是硬停止。我必须使用硬件重置按钮重置 CPU，因为远程控制台被冻结了。

这是没有优化(-O0(的调试版本，使用gcc-linaro-6.3.1-2017.05-i686-mingw32_arm-linux-gnueabihf。我被告知 64K 区域可以按字节访问，我没有明确尝试。

是否有我需要担心的实际(潜在(问题，或者是否有特定原因导致 memcpy 不起作用，并且在这种情况下可能不应该首先使用 - 我可以继续使用我的 for 循环而什么都不考虑它？

编辑：已经观察到另一个效果：原始代码片段在复制 for 循环中缺少"重要"printf，该打印在内存读取之前出现。删除了，我没有取回有效数据。我现在更新了代码片段，以便从同一地址而不是 printf 进行额外的读取，这也会产生正确的数据。混乱加剧了。

这里(我认为(正在发生的事情的重要摘录。稍作修改，如图所示，以这种"去绒毛"的形式进行。

// void* physicalAddr: PCIe "BAR0" address as reported by dmesg, added to the physical address offset of FPGA memory region
// long size: size of the physical region to be mapped 
//--------------------------------
// doing the memory mapping
//
const uint32_t pageSize = getpagesize();
assert( IsPowerOfTwo( pageSize ) );
const uint32_t physAddrNum = (uint32_t) physicalAddr;
const uint32_t offsetInPage = physAddrNum & (pageSize - 1);
const uint32_t firstMappedPageIdx = physAddrNum / pageSize;
const uint32_t lastMappedPageIdx = (physAddrNum + size - 1) / pageSize;
const uint32_t mappedPagesCount = 1 + lastMappedPageIdx - firstMappedPageIdx;
const uint32_t mappedSize = mappedPagesCount * pageSize;
const off_t targetOffset = physAddrNum & ~(off_t)(pageSize - 1);
m_fileID = open( "/dev/mem", O_RDWR | O_SYNC );
// addr passed as null means: we supply pages to map. Supplying non-null addr would mean, Linux takes it as a "hint" where to place.
void* mapAtPageStart = mmap( 0, mappedSize, PROT_READ | PROT_WRITE, MAP_SHARED, m_fileID, targetOffset );
if (MAP_FAILED != mapAtPageStart)
{
m_userSpaceMappedAddr = (volatile void*) ( uint32_t(mapAtPageStart) + offsetInPage );
}
//--------------------------------
// Accessing the mapped memory
//
//void* m_rawData: <== m_userSpaceMappedAddr
//uint32_t* destination: points to a stack object
//int length: size in 32bit words of the stack object (a struct with only U32's in it)
// this crashes:
std::memcpy( destination, m_rawData, length * sizeof(uint32_t) );
// this does not, AND does yield correct memory contents - but only with a preceding extra read
for (int i=0; i<length; ++i)
{
// This extra read makes the data gotten in the 2nd read below valid.
// Commented out, the data read into destination will not be valid.
uint32_t tmp = ((const volatile uint32_t*)m_rawData)[i];
(void)tmp; //pacify compiler
destination[i] = ((const volatile uint32_t*)m_rawData)[i];
}

根据描述，您的 FPGA 代码似乎无法正确响应从 FPGA 上的位置读取的加载指令，从而导致 CPU 锁定。它不会崩溃，而是永久停滞，因此需要进行硬重置。在 FPGA 上调试 PCIE 逻辑时，我也遇到了这个问题。

您的逻辑没有正确响应的另一个迹象是您需要额外的读取才能获得正确的响应。

您的循环正在执行 32 位加载，但 memcpy 正在执行至少 64 位加载，这会改变您的逻辑响应方式。例如，如果完成的前 128 位和完成的第二个 128 位 TLP 中的后 32 位，则需要使用两个具有 32 位响应的 TLP。

我发现超级有用的是添加逻辑将所有PCIE事务记录到SRAM中，并能够转储SRAM以查看逻辑的行为或行为异常。我们有一个漂亮的实用程序，pcieflat，每行打印一个PCIE TLP。它甚至有文档。

当 PCIE 接口运行得不够好时，我将日志流式传输到十六进制的 UART，该 UART 可以通过 pcieflat 解码。

此工具对于调试性能问题也很有用 - 您可以查看 DMA 读取和写入的管道情况。

或者，如果您在FPGA上集成了逻辑分析仪或类似功能，则可以通过这种方式跟踪活动。但是，根据PCIE协议解析TLP会更好。