在编译时评估函数开销的通用方法

generic way to evaluate a function cost at compile time

本文关键字：方法开销函数编译评估更新时间：2023-10-16

我目前正在研究多维数组迭代器的实现。考虑到两个连续范围(对于 std：：equal、std：：copy 目的(的迭代，它们表示具有不同对齐方式的兼容数据(2D 中的行与 col 主要(，我想找到每个迭代器的步幅顺序，给出最快的执行时间。

例如：

row of vector components = A -> m elements
row of vectors =           B -> n elements
2D plan of vectors =       C -> 3 elements
row of plan of vectors =   D -> 10 elements
given the datas ordered by ascending strides:
first array:   B | A | C | D
second array:  B | A | D | C
Obviously, we can iterate over both iterators by bunches of m*n elements. Then:
If we choose the first array convention, the first iterator is contiguous and the second one 
will perform (3 - 1)*(10 - 1) jumps forward with a stride of 10 and (10 - 1) jumps backward.
If we choose the second array convention, the second iterator is contiguous and the first one will 
perform (10 - 1)*(3 - 1) jumps forward with a stride of 3 and (3 - 1) jumps backward.
=> The second convention is better at everything in this example.

由于我必须考虑很多因素，例如来回内存，连续性和迭代器实现本身(这并非微不足道(，因此我想执行一个实验计划。但是我也知道编译时的所有内容(大小和步幅(，因此在编译时为每个模板实例化执行实验计划会很酷。我的问题是：

是否可以在编译时评估某些指令的运行时成本，当编译时除了输入数组的内存地址之外的所有内容都是已知的？

No.你的问题是基于错误的假设。

一些不好的假设(可能还有其他假设(：

该函数按原样使用：编译器可能会在许多地方内联它，或者决定最好将其放在单独的函数中，因为代码大小会提高。由于周围的代码可能会获得略有不同的行为，因此您可能会看到不同的性能。
是有代价的：在许多情况下，处理器不按顺序运行指令，或者它们并行化指令。如果像除法这样可能需要很长时间的东西被其他内存访问包围并摊销成本，则可能会被隐藏。
性能与处理器无关。编译器不知道你将在哪个特定的处理器上运行，缓存或缓存行有多大，主内存有多快，或者分支预测的好/坏。所有这些都对性能产生了巨大影响。

您可以做的是剖析和测量。使用此功能分析应用程序，并查看是否真的需要修复它。衡量您获得的性能并尝试不同的选项。