运行时的动态函数解析

Dynamic function resolution at runtime

本文关键字：函数动态运行时更新时间：2023-10-16

我的项目需要在运行时加载许多模块，每个模块都包含许多函数，其形式类似于下面的伪代码：

void someFunction(Context &ctx) {
bool result;
result = ctx.call("someFunction2")(ctx.arg["arg1"], ctx.arg["arg2"])
&& ctx.call("someFunction3")(ctx.arg["arg1"], ctx.arg["arg3"]);
ctx.result(result);
}

其中ctx.arg["arg1"]、ctx.arg["arg2"]、ctx.arg["arg3"]是在运行时传递给someFunction的参数。someFunction2和someFunction3无法在编译时静态解析，但在运行时加载所有模块时都会知道(无论它们是否已在其他模块中定义)。

现在，一个朴素的实现将使用哈希映射来存储所有这些函数的函数句柄，但哈希会很慢，因为通常有 10k 个函数需要搜索，并且每个函数将在其他函数中多次调用(例如：枚举参数以找到将产生所需结果的正确组合)。

因此，我正在寻找某种解决方案，它将在加载所有模块时对这些"ctx.call"执行一次性替换，而不是每次都执行"哈希和探测"。目前的主要问题是"替换"操作。我想出了一些想法，但它们并不完美：

第一种解决方案：创建一个内部函数inner_func(func_handle1, func_handle2, arg1, arg2, arg3)，并使用std::bind创建一个外部包装outer_wrapper()。

问题：不是用户友好的，必须明确告诉上下文要查找哪些函数和参数。

第二个解决方案：使用元编程 + constexpr + 宏自动计算函数和参数名称引用，然后创建一个引用表，然后让上下文在运行时填充每个表。

问题：我无法解决，需要一些帮助。我已经从Facebook，mpl和hana从boost上阅读了致命图书馆的文档，但似乎没有一种干净的方法来做到这一点。

第三种解决方案：使用 JIT 编译器

问题：C++ JIT 编译器选择有限。NativeJIT 不够强大，easy：：JIT 似乎不是可定制的，也不容易分发。ASMJIT 不可用。

PS：问题上下文是"自动计划器"，这些函数用于构造谓词。Context ctx只是一个示例，如有必要，您可以使用其他适当的语法，只要它们易于用于表示以下 Lisp 表达式：

(and (at ?p ?c1)
(aircraft ?a)
(at ?a ?c3)
(different ?c1 ?c3))

PPS：更具体地说，我正在考虑看起来像这样的东西：

用户将定义一个如下所示的函数：

void module_init() {
FUNC ("someFunction")("p", "a", "c1", "c3") (
bool result;
result = CALL("at")("p", "c1") 
&& CALL("aircraft")("a")
&& CALL("at")("a", "c3")
&& CALL("different")("c1", "c3")
/// Users should also be able to access arguments as a "Variable" 
/// class using ARG["p"]
return result;
)
}

然后通过某种方式，FUNC()将被转换为类似于以下内容的函子：

struct func_someFunction {
some_vector<std::function<bool()>> functions;
some_vector<Variable*> args;
some_vector<std::string> func_name;
some_vector<std::string> arg_name;
bool operator()() {
/// above representation of Func(), but function and args are pointers in "functions" and "args"
}
}

然后，当所有模块都加载完毕后，系统将读取func_name和arg_name，并分别填充适当的函数指针和变量指针functions和args。

~~状态：首先使用哈希图，完成后我将发布更新。~~

状态：自己想出了一个解决方案，还测试了哈希实现，发布在下面。

任何想法将不胜感激。谢谢！

现在，一个朴素的实现将使用哈希映射来存储所有这些函数的函数句柄，但哈希会很慢，因为通常有 10k 个函数要搜索 [...]

哈希表的查找成本为O(1)。您是否尝试过这种广泛使用的解决方案并进行了性能分析？您是否尝试过使用不同的哈希算法来减少哈希时间和冲突？

如果您需要在整个程序生命周期中根据运行时字符串键不断找到要运行的正确函数，那么使用哈希映射是没有办法的。(保罗的回答)

但是，如果您在运行时初始化一个在程序持续时间内不会更改的函数列表(即，在初始阶段之后您不需要执行任何"查找"操作)，那么您可以将这些函数放在一个连续的容器中(例如std::vector)，以提高访问时间和缓存利用率：

// getFuncNames is where you are deciding on the list of functions to run
// funcs is a vector of function handles
// funcMap is a hash map of function names to function handles
for (auto& funcName : getFuncNames())
{
funcs.push_back(funcMap.at(funcName));
}

这可能是矫枉过正，但可能是一个有用的想法：

使用字符串实习来确保每个MyString("aircraft")都产生相同的对象。当然，这意味着您的字符串必须是不可变的。
将创建的每个字符串对象与高质量的随机数(uint64_t)相关联，并将其用作该字符串的"哈希"。

由于"哈希"与字符串一起存储，因此"计算"它是一个简单的内存负载。而且由于您使用了一个好的 PRNG 来生成该"哈希"，因此它作为哈希表的键表现得非常出色。

每当将std::string转换为驻留字符串对象时，您仍然需要计算经典哈希以在现有字符串对象的表中查找MyString对象，但这是一次性工作，可以在词法分析器处理配置文件或加载模块时完成。字符串与其各自函数实现等的实际匹配将与经典哈希的计算分离。

好的，所以我自己想出了一个解决方案，接近我问题中的第一个解决方案，我做了一个非常简单的问题示例，发布在github上，链接如下：

分别使用哈希表和指针进行演示

注意：此解决方案只是一个简单的演示，没有优化。其他可能的优化包括：

对于哈希映射方法，字符串实习可用于减少字符串构造开销，正如 Konrad Rudolph 和 cmaster - 恢复 monica 所建议的那样，它会导致中等(与指针相比降低约 50%)性能下降，但消除了动态字符串创建开销以及减少内存消耗。boost::flyweight是一个不错的选择。
对于哈希映射方法，我只是使用std::unordered_map实现了演示，但存在更好的替换，包括google::dense_hash_map、tsl::hop_scotch_map等，它们值得尝试，但根据 Tessil 的基准测试，我怀疑它们的 O(s)(其中s是平均字符串长度)每次搜索的时间复杂度可能比O(1) 指针访问更快。
在我的场景中，所有函数都可以在模块加载阶段之后找到，但是，您可能希望覆盖一个场景，例如 python 中的符号查找，那么哈希映射会更好，除非您对 senario 引入更多约束或定期更新解析的指针。如果要大规模插入和删除内容，Trie数据结构可能是一个不错的选择。

喋喋不休够了，这是结果和解决方案：

性能

基准测试：混合布尔和数字 SAT 问题的 1.28e8 种可能组合

平台：i7 6700HQ，单线程

cmake-build-debug/test_ptr  0.70s user 0.00s system 99% cpu 0.697 total
cmake-build-debug/test_hash  4.24s user 0.00s system 99% cpu 4.241 total

来自 perf 的热点和函数运行时：

test_ptr：

53.17%  test_ptr  test_ptr       [.] main
35.38%  test_ptr  test_ptr       [.] module_1_init(Domain&)::__internal_func_some_circuit::operator()
8.02%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_and_circuit::operator()
1.90%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_or_circuit::operator()
0.18%  test_ptr  libc-2.23.so   [.] _int_malloc
0.15%  test_ptr  ld-2.23.so     [.] do_lookup_x
0.15%  test_ptr  test_ptr       [.] module_2_init(Domain&)::__internal_func_xor_circuit::operator()

test_hash：

33.11%  test_hash  test_hash            [.] Domain::call<char const (&) [11], Domain::Variable&, Domain::Variable&>
25.37%  test_hash  test_hash            [.] main
21.46%  test_hash  libstdc++.so.6.0.26  [.] std::_Hash_bytes
5.10%  test_hash  libc-2.23.so         [.] __memcmp_sse4_1
4.64%  test_hash  test_hash            [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
3.41%  test_hash  test_hash            [.] module_1_init(Domain&)::__internal_func_some_circuit::operator()
1.86%  test_hash  libc-2.23.so         [.] strlen
1.44%  test_hash  test_hash            [.] module_2_init(Domain&)::__internal_func_and_circuit::operator()
1.39%  test_hash  libc-2.23.so         [.] __memcpy_avx_unaligned
0.55%  test_hash  test_hash            [.] std::_Hash_bytes@plt

哈希映射实现具有非常高的开销，来自重复哈希和函数查找。

溶液

宏被大量使用，以使用户更容易定义函数(谓词)：

in test_ptr:
void module_1_init(Domain &d) {
FUNC(some_circuit, d,
DEP(and_circuit, or_circuit, xor_circuit, not_circuit),
ARG(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10),
BODY(
return CALL(and_circuit, a1, a2)
&& CALL(or_circuit, a3, a4)
&& CALL(xor_circuit, a5, a6)
&& CALL(not_circuit, a7)
&& a8.value >= R1 && a9.value >= R2 && a10.value >= R3;
)
);
}

in test_hash:
void module_1_init(Domain &d) {
FUNC(some_circuit, d,
ARG(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10), 
BODY(
return CALL(and_circuit, a1, a2)
&& CALL(or_circuit, a3, a4)
&& CALL(xor_circuit, a5, a6)
&& CALL(not_circuit, a7)
&& a8.value >= R1 && a9.value >= R2 && a10.value >= R3;
)
);
}

主要区别在于指针解决方案中的DEP()宏，DEP()将显式指定依赖函数，并将构造本地函数指针表。

以下是宏扩展后实际生成的代码：

in test_ptr:
void module_1_init(Domain &d) {
class __internal_func_some_circuit : public Domain::Function { 
public: 
enum func_dep_idx { 
and_circuit, 
or_circuit, 
xor_circuit, 
not_circuit, 
__func_dep_idx_end }; 
Domain::Variable a1; 
Domain::Variable a2;
...
Domain::Variable a10; 
explicit __internal_func_some_circuit(Domain &d) : 
a1(), a2(), a3(), a4(), a5(), a6(), a7(), a8(), a9(), a10(),
Domain::Function(d) { 
arg_map = {{"a1", &a1}, {"a2", &a2}, {"a3", &a3} ..., {"a10", &a10}}; 
arg_pack = { &a1, &a2, &a3, &a4, &a5, &a6, &a7, &a8, &a9, &a10}; 
func_dep_map = {{"and_circuit", func_dep_idx::and_circuit}, 
{"or_circuit", func_dep_idx::or_circuit},
{"xor_circuit", func_dep_idx::xor_circuit} , 
{"not_circuit", func_dep_idx::not_circuit}}; 
func_dep.resize(__func_dep_idx_end); 
} 
bool operator()() override { 
return func_dep[func_dep_idx::and_circuit]->call(a1, a2) && 
func_dep[func_dep_idx::or_circuit]->call(a3, a4) && 
func_dep[func_dep_idx::xor_circuit]->call(a5, a6) && 
func_dep[func_dep_idx::not_circuit]->call(a7) && 
a8.value >= 100 && a9.value >= 100 && a10.value >= 100; 
} 
}; 
d.registerFunction("some_circuit", new __internal_func_some_circuit(d))

in test_hash:
class __internal_func_some_circuit : public Domain::Function { 
public: 
Domain::Variable a1; 
Domain::Variable a2; 
...
Domain::Variable a10; 
explicit __internal_func_some_circuit(Domain &d) : 
a1() , a2(), a3(), a4(), a5(), a6(), a7(), a8(), a9(), a10(), 
Domain::Function(d) { 
arg_map = {{"a1", &a1}, {"a2", &a2} ..., {"a10", &a10}}; 
arg_pack = {&a1, &a2, &a3, &a4, &a5, &a6, &a7, &a8, &a9, &a10}; 
} 
bool operator()() override { 
return domain.call("and_circuit", a1, a2) && 
domain.call("or_circuit", a3, a4) && 
domain.call("xor_circuit", a5, a6) && 
domain.call("not_circuit", a7) && 
a8.value >= 100 && a9.value >= 100 && a10.value >= 100; } 
}; 
d.registerFunction("some_circuit", new __internal_func_some_circuit(d))

基本上，指针解决方案创建一个函数查找表func_dep_map，稍后Domain类将使用该表来搜索该函数依赖的其他函数，以及一个函数指针向量func_dep，它将填充它们的指针。

enum用于提供一种优雅而紧凑的方式来查找索引，而不是使用元编程库(如 Fatal 和 boost：：mpl)提供的映射类，在这种情况下使用它们并不方便。

此实现在很大程度上依赖于 boost：:p reprocessor，要查看更多详细信息，请参阅我的 github 存储库。