为什么这个局部的 recrutic C++ lambda 如此缓慢?

Why is this local recrusive C++ lambda so terrible slow?

本文关键字：lambda 缓慢 C++ recrutic 个局为什么更新时间：2023-10-16

为了理解，我正在使用带有递归调用(尾递归(的本地lambda。运行它(例如在 http://cpp.sh/或 https://coliru.stacked-crooked.com/上(，它总是表明 lambda 调用比其他解决方案慢得多。

我的问题是为什么这样？

#include <iostream>
#include <chrono>
#include <functional>

//tail recursive lamda
long long int Factorial5(long long int n)
{ 
std::function<long long int(long long int,long long int)> aux
= [&aux](long long int n, long long int acc)
{ 
return n < 1 ? acc : aux(n - 1, acc * n);
};
return aux(n,1);
}

//tail recursive inline class
long long int Factorial6(long long int n)
{ 
class aux {
public: inline static long long int tail(long long int n, long long int acc)
{ 
return n < 1 ? acc : tail(n - 1, acc * n);
}
};
return aux::tail(n,1);
}

int main()
{
int v = 55;
{
auto t1 = std::chrono::high_resolution_clock::now();
auto result = Factorial5(v);
auto t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms = t2 - t1;
std::cout << std::fixed << "lamda(5)tresult " << result
<< " took " << ms.count() << " msn";
}
{
auto t1 = std::chrono::high_resolution_clock::now();
auto result = Factorial6(v);
auto t2 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double, std::milli> ms = t2 - t1;
std::cout << std::fixed << "inner class(6)tresult " << result
<< " took " << ms.count() << " msn";
}
}

输出：

lamda(5)        result 6711489344688881664 took 0.076737 ms
inner class(6)  result 6711489344688881664 took 0.000140 ms

lambda 不是std::function。每个 lambda 都有自己独特的类型。它大致看起来像这样：

struct /* unnamed */ {
auto operator()(long long int n, long long int acc) const
{ 
return n < 1 ? acc : aux(n - 1, acc * n);
}
} aux{};

所以原则上，lambda 真的快，和任何函数或非虚拟成员函数一样快。

缓慢来自std::function，这是一个围绕λ的类型擦除包装器。它将函数调用转换为虚拟调用，并可能动态分配 lambda。这是昂贵的，并防止内联。

要创建递归 lambda，您必须使用 C++14 个通用 lambda 并将 lambda 发送给自身：

auto aux = [](auto aux, long long int n, long long int acc) -> long long int
{ 
return n < 1 ? acc : aux(aux, n - 1, acc * n);
};
return aux(aux, n, 1);

就其价值而言，我的系统上的结果是：

lamda(5)        result 6711489344688881664 took 0.003501 ms
inner class(6)  result 6711489344688881664 took 0.002312 ms

这远没有那么剧烈，但仍然是可以衡量的差异。

差异的原因是该函数几乎不做任何工作，但有非常非常多的函数调用。因此，任何函数调用开销都会主导结果。std::function有开销，这显示在测量中。

请注意，示例程序溢出long long，因此结果毫无意义，因为行为完全未定义。正确的结果应该是大约 6.689503e+198。从理论上讲，UB 也会使运行时测量的结果无效，尽管在实践中不一定。

lambda 是一个未命名类型的对象，它调用不同对象(std::functionoperator()(，它调用 lambda，...

也就是说，存在通过对象的相互递归和间接，这使得这很难优化。
您的代码或多或少等同于此(我已将类型缩短为int(：

struct Aux;
struct Fun
{
const Aux& aux;
Fun(const Aux& aux) : aux(aux) {}
int work(int n, int acc) const;
};
struct Aux
{
const Fun& fun;
Aux(const Fun& fun) : fun(fun) {}
int actual_work(int n, int acc) const
{
return n < 1 ? acc : fun.work(n-1, acc*n);
}
};
int Fun::work(int n, int acc) const
{
return aux.actual_work(n, acc);
}
int factorial5(int n)
{
Fun fun = Aux(fun);
return fun.work(n, 1);
}

这更清楚地表明，有很多看不见的东西正在发生。