递归函数的内联

Inlining of a recursive function

本文关键字:递归函数      更新时间:2023-10-16

当我尝试编译这段代码时:

#include <iostream>
#include <limits.h>
// End recursive template-expansion of function select below.
template <typename Type>
static inline constexpr Type select(unsigned index)
{ return Type(); }
// Select one of the items passed to it.
// e.g. select(0, a, b, c) = a; select(1, a, b, c) = b; etc.
template <typename Type, typename... Params>
[[gnu::always_inline]]
static inline constexpr Type select(unsigned index, Type value, Params... values)
{ return index == 0 ? value : select<Type>(index - 1, values...); }
template <typename Type>
[[gnu::always_inline]]
static inline constexpr Type reflect_mask_helper_1(Type mask, Type shift, Type value)
{ return ((value & mask) >> shift) | ((value << shift) & mask); }
template <typename Type>
[[gnu::always_inline]]
static inline constexpr Type reflect_mask_helper_0(unsigned i, Type value)
{
return i == 0
? value
: reflect_mask_helper_0(
i - 1,
reflect_mask_helper_1<Type>(
select(i - 1, 0xaaaaaaaaaaaaaaaa, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000),
1 << (i - 1),
value));
}
template <typename Type>
[[gnu::flatten]]
static inline constexpr Type reflect_mask(Type value)
{ return reflect_mask_helper_0(__builtin_ctz(sizeof(Type) * CHAR_BIT), value); }
int main(void) {
for (int i = 0; i < 65536; i++) {
std::cout << reflect_mask<uint16_t>(i) << std::endl;
}
}

GCC 给了我一个错误,说函数reflect_mask_helper_0不能内联,因为它是递归的。但是,函数select也是递归的,但 gcc 内联它而不会抱怨。我在这里错过了什么?

(我需要它是递归的,因为constexpr函数不能包含 C++11 下的循环。

错误信息:

% g++ test.cpp -O3 -march=native -c
test.cpp: In function ‘constexpr Type reflect_mask_helper_0(unsigned int, Type) [with Type = short unsigned int]’:
test.cpp:23:30: error: inlining failed in call to always_inline ‘constexpr Type reflect_mask_helper_0(unsigned int, Type) [with Type = short unsigned int]’: recursive inlining
23 | static inline constexpr Type reflect_mask_helper_0(unsigned i, Type value)
|                              ^~~~~~~~~~~~~~~~~~~~~
test.cpp:27:28: note: called from here
27 |     : reflect_mask_helper_0(
|       ~~~~~~~~~~~~~~~~~~~~~^
28 |         i - 1,
|         ~~~~~~              
29 |         reflect_mask_helper_1<Type>(
|         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 |           select(i - 1, 0xaaaaaaaaaaaaaaaa, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
|           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31 |                         0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000),
|                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 |           1 << (i - 1),
|           ~~~~~~~~~~~~~     
33 |           value));
|           ~~~~~~~           
test.cpp: In function ‘int main()’:
test.cpp:23:30: error: inlining failed in call to always_inline ‘constexpr Type reflect_mask_helper_0(unsigned int, Type) [with Type = short unsigned int]’: recursive inlining
23 | static inline constexpr Type reflect_mask_helper_0(unsigned i, Type value)
|                              ^~~~~~~~~~~~~~~~~~~~~
test.cpp:27:28: note: called from here
27 |     : reflect_mask_helper_0(
|       ~~~~~~~~~~~~~~~~~~~~~^
28 |         i - 1,
|         ~~~~~~              
29 |         reflect_mask_helper_1<Type>(
|         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30 |           select(i - 1, 0xaaaaaaaaaaaaaaaa, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
|           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31 |                         0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000),
|                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 |           1 << (i - 1),
|           ~~~~~~~~~~~~~     
33 |           value));
|           ~~~~~~~

select实际上并不调用自己。它弹出它收到的类型列表的前面,然后调用select<Type, ...>的另一个专用化。尾随参数包不同。由于"递归"本质上是一组有限的嵌套函数调用(不同的函数(,因此无论运行时参数如何,GCC 都可以直接看到它。

但是reflect_mask_helper_0确实无限期地使用相同的模板参数调用自己。GCC 无法判断这种运行时递归在运行时的深度。回想一下,constexpr函数仍然是必须在运行时调用的常规函数。

如果你检查出生成的汇编代码,如果你删除always_inlineflatten属性,你可以看到 gcc 实际上正确地内联了所有内容。

所以,这个问题是一个QoI问题。也许,在这一点上,当always_inline处理时,它无法内联(因此出现错误消息(,但 gcc 决定在之后内联它。

顺便说一句,你可以微调 gcc,只需对代码进行一些修改,gcc 就可以编译它:

  • --param max-early-inliner-iterations=3传递到海湾合作委员会
  • 删除flatten属性(不知道,为什么它很重要...

(所以,实际上,这个问题与递归调用无关 - 从编译器的角度来看,函数是否递归并不重要,它只是遵循代码的流程 - 当然,在某种程度上。在这里,递归深度只有 4,对于编译器来说并不难理解(

这是我找到的解决方案,感谢grek40的评论和StoryTeller的回答。

(至于我之前在编译的二进制文件中留下的未使用的函数模板实例的问题,我通过编译原始代码来解决它 - 没有gnu::always_inlinegnu::flatten属性 - 带有参数-ffunction-sections -fdata-sections -Wl,--gc-sections

现在reflect_mask_helper_0位于struct内(因为C++不允许函数模板的部分专用化(,并且函数的i参数成为struct模板的Index参数。

#include <iostream>
#include <limits.h>
// End recursive template-expansion of function select below.
template <typename Type>
static inline constexpr Type select(unsigned index)
{ return Type(); }
// Select one of the items passed to it.
// e.g. select(0, a, b, c) = a; select(1, a, b, c) = b; etc.
template <typename Type, typename... Params>
[[gnu::always_inline]]
static inline constexpr Type select(unsigned index, Type value, Params... values)
{ return index == 0 ? value : select<Type>(index - 1, values...); }
template <typename Type>
[[gnu::always_inline]]
static inline constexpr Type reflect_mask_helper_1(Type mask, Type shift, Type value)
{ return ((value & mask) >> shift) | ((value << shift) & mask); }
template <typename Type, unsigned Index>
struct reflect_mask_helper_0
{
[[gnu::always_inline]]
static inline constexpr Type invoke(Type value)
{
return reflect_mask_helper_0<Type, Index - 1>::call(
reflect_mask_helper_1<Type>(
static_cast<Type>(select(Index - 1,
0xaaaaaaaaaaaaaaaa, 0xcccccccccccccccc, 0xf0f0f0f0f0f0f0f0,
0xff00ff00ff00ff00, 0xffff0000ffff0000, 0xffffffff00000000)),
1 << (Index - 1),
value));
}
};
template <typename Type>
struct reflect_mask_helper_0<Type, 0>
{
[[gnu::always_inline]]
static inline constexpr Type invoke(Type value) { return value; }
};
template <typename Type>
static inline constexpr Type reflect_mask(Type value)
{ return reflect_mask_helper_0<Type, __builtin_ctz(sizeof(Type) * CHAR_BIT)>::invoke(value); }
int main(void) {
for (int i = 0; i < 65536; i++) {
std::cout << reflect_mask<uint16_t>(i) << std::endl;
}
}