同一函数的模板版本和非模板版本

Template version and non-template version of the same function

本文关键字：版本函数更新时间：2023-10-16

考虑以下简单函数

void foo_rt(int n) {
    for(int i=0; i<n; ++i) {
        // ... do something relatively cheap ...
    }
}

如果我知道在编译时n参数，我可以编写相同函数的模板版本：

template<int n>
void foo_ct() {
    for(int i=0; i<n; ++i) {
        // ... do something relatively cheap ...
    }
}

这允许编译器执行循环展开等操作，从而提高速度。

但是现在假设我有时在编译时知道n，有时只在运行时知道。如何在不维护函数的两个版本的情况下实现这一点？我在想一些类似的事情：

inline void foo(int n) {
    for(int i=0; i<n; ++i) {
        // ... do something relatively cheap ...
    }
}
// Runtime version
void foo_rt(int n) { foo(n); }
// Compiletime version
template<int n>
void foo_ct() { foo(n); }

但我不确定是否所有的编译器都足够聪明来处理这个问题。有没有更好的方法？

编辑：

显然，一种有效的解决方案是使用宏，但我真的想避免这种情况：

#define foo_body 
{ 
    for(int i=0; i<n; ++i) { 
        // ... do something relatively cheap ... 
    } 
}
// Runtime version
void foo_rt(int n) foo_body
// Compiletime version
template<int n>
void foo_ct() foo_body

我以前做过这个，使用 integral_variable 类型和std::integral_constant . 这看起来像很多代码，但如果你再看一遍，它实际上只是一系列四个非常简单的部分，其中一个只是演示代码。

#include <type_traits>
//type for acting like integeral_constant but with a variable
template<class underlying>
struct integral_variable {
    const underlying value;
    integral_variable(underlying v) :value(v) {}
}; 
//generic function
template<class value> 
void foo(value n) {
    for(int i=0; i<n.value; ++i) {
        // ... do something relatively cheap ...
    }
} 
//optional: specialize so callers don't have to do casts
void foo_rt(int n) { return foo(integral_variable<int>(n)); }
template<int n>
void foo_ct() { return foo(std::integral_constant<unsigned, n>()); }
//notice it even handles different underlying types.  Doesn't care.
//usage is simple
int main() {
    foo_rt(3);
    foo_ct<17>();
}

尽管我很欣赏 DRY 原则，但我认为没有办法写两次。

尽管代码相同，但这是两种非常不同的操作 - 使用已知值与使用未知值。

您希望将已知的优化置于未知的优化的快速通道上。

我要做的是将所有不依赖于 n 的代码分解到另一个函数中（希望它是您的 for 循环的整个主体），然后让您的模板化和非模板化版本在其循环中调用它。这样，您唯一要重复的是for循环的结构，我认为这没什么大不了的。

如果在编译时已知值，则将其作为模板参数通过模板路由不会在编译时使其更加已知。我认为不太可能有任何编译器会内联和优化函数，因为变量是模板参数而不是其他类型的编译时常量。

根据您的编译器，您甚至可能不需要函数的两个版本。优化编译器很可能只能优化使用常量表达式参数调用的函数。例如：

extern volatile int *I;
void foo(int n) {
    for (int i=0;i<n;++i)
        *I = i;
}
int main(int argc,char *[]) {
    foo(4);
    foo(argc);
}

我的编译器将其转换为从 0 到 3 的内联、展开循环，然后在 argc 上执行内联循环：

main:                                   # @main
# BB#0:                                 # %entry
        movq    I(%rip), %rax
        movl    $0, (%rax)
        movl    $1, (%rax)
        movl    $2, (%rax)
        movl    $3, (%rax)
        testl   %ecx, %ecx
        jle     .LBB1_3
# BB#1:                                 # %for.body.lr.ph.i
        xorl    %eax, %eax
        movq    I(%rip), %rdx
        .align  16, 0x90
.LBB1_2:                                # %for.body.i4
                                        # =>This Inner Loop Header: Depth=1
        movl    %eax, (%rdx)
        incl    %eax
        cmpl    %eax, %ecx
        jne     .LBB1_2
.LBB1_3:                                # %_Z3fooi.exit5
        xorl    %eax, %eax
        ret

要获得这样的优化，您需要确保定义对所有翻译单元都可用（例如，通过在头文件中将函数定义为内联），或者有一个执行链接时优化的编译器。

如果你使用它，并且你真的依赖于在编译时计算的一些东西，那么你应该有自动化测试来验证它是否完成。

C++11 提供了 constexpr，它允许您编写一个函数，当给定 constexpr 参数时，该函数将在编译时计算，或者保证在编译时计算值。constexpr 函数中的内容存在限制，这可能会使您的函数难以实现为 constexpr，但允许的语言显然是图灵完备的。一个问题是，虽然这些限制保证了在给定 constexpr 参数的情况下可以在编译时完成计算，但当参数不是 constexpr 时，这些限制可能会导致实现效率低下。

为什么不像这样：

template<typename getter>
void f(getter g)
{
   for (int i =0; i < g.get(); i++) { blah(); }
}
struct getter1 { inline constexpr int get() { return 1; } }
struct getterN { getterN(): _N(N) {} inline constexpr int get() { return k; } }
f<getter1>(getter1());
f<getterN>(getterN(100));

我会指出，如果你有：

// Runtime version
void foo_rt(int n){ foo(n);}

。这对你有用，那么你实际上在编译时就知道了类型。至少，你知道一个与之共变的类型，这就是你需要知道的。您可以只使用模板化版本。如果需要，可以在调用站点指定类型，如下所示：

foo_rt<int>()