使用英特尔的 PMU 库分析缓存命中/未命中次数

Using Intel's PMU library to profile the number of cache hits/misses

本文关键字：缓存英特尔 PMU 更新时间：2023-10-16

是否可以使用英特尔的PMU库来计算C程序中特定代码段的缓存命中/未命中次数？计数似乎被系统上运行的其他应用程序污染了。

库是否支持单独隔离与某个特定代码片段相对应的缓存统计信息（即，不受系统上运行的其他应用程序的干扰）？

这是我用测试的代码片段

SystemCounterState before = getSystemCounterState();
SystemCounterState after = getSystemCounterState();
cout << "===========================================================" << endl;
cout << "Instructions per Clock: " << getIPC(before, after) <<
    "nL2 cache hits: " << getL2CacheHits(before, after) <<
    "nL2 cache misses: " << getL2CacheMisses(before, after) <<
    "nL2 cache hit ratio: " << getL2CacheHitRatio(before, after) <<
    "nL3 cache hits: " << getL3CacheHits(before, after) <<
    "nL3 cache misses: " << getL3CacheMisses(before, after) <<
    "nL3 cache hit ratio: " << getL3CacheHitRatio(before, after) <<
    "nWasted cycles caused by L3 misses: " << getCyclesLostDueL3CacheMisses(before, after) <<
    "nBytes read from DRAM: " << getBytesReadFromMC(before, after) << endl;
cout << "===========================================================" << endl;

这些是我得到的统计数据（注意，尽管我不做任何计算，但缓存命中/未命中计数很高）：

===========================================================
Instructions per Clock: 0.410805
L2 cache hits: 2677
L2 cache misses: 2658
L2 cache hit ratio: 0.501781
L3 cache hits: 2151
L3 cache misses: 507
L3 cache hit ratio: 0.809255
Wasted cycles caused by L3 misses: 0.0242752
Bytes read from DRAM: 514048
===========================================================

提前谢谢。

仅仅打印"根本不打印计算"实际上就是在进行计算。

您正在调用C++例程"cout"，这将导致相当多的代码执行。如果你想看到这个，编译这个程序：

#include <iostream>
using namespace std;
int main()
{
    int i;
    i = 1;
    cout << "Hello World" << endl;
    i = 2;
}

使用gdb，在cout上设置一个断点，然后执行"stepi"命令。您将看到在执行"cout"时执行了多少指令。

所有这些执行访问内存的指令，包括指令本身和指令使用的数据，这可能导致相当多的缓存未命中。

您可能想尝试在不进行任何打印的情况下抓取计数器。