Python在简单数组查找上的小循环的性能问题
Performance issues in Python for small loops over simple array look-ups
为什么简单的循环和/或简单的数组查找在Python中如此缓慢?
具体来说,在下面的例子中,Python(使用pypy
)比c++(使用-O2
)慢大约9倍。性能损失的技术原因是什么?是机器代码中Python循环的实现吗?编译器使用的优化的差异?内存管理?还是别的什么?
# File: timing.py
import sys
T = [
[ 0, 6, 7, 7, 7, 8, 6, 7, 8,19,20, 7, 7, 7, 7, 7, 7,21,22,19,20,21,22,29, 7, 7, 7, 7, 7,29,35,36, 7, 7, 7,35,36, 7, 7,],
[ 9, 7,10, 7, 7, 7, 1, 7,23, 7, 7, 1,23, 7, 7, 7, 7, 7, 7, 9,10,30,31, 7, 9,10,30,31, 7,23, 7, 7,23, 7, 7,30,31,30,31,],
[ 7, 7, 2,11,12, 7, 7, 7, 7, 7, 7,11,12,24,25,26,27, 7, 7, 7, 7, 7, 7, 7,24,25,26,27,32, 7, 7, 7,32,37,38, 7, 7,37,38,],
[13, 7,14, 7, 7, 7, 3, 7,28, 7, 7, 3,28, 7, 7, 7, 7, 7, 7,13,14,33,34, 7,13,14,33,34, 7,28, 7, 7,28, 7, 7,33,34,33,34,],
[15, 7,16, 7, 7, 7, 7, 7, 4, 7, 7, 7, 4, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[17, 7,18, 7, 7, 7, 7, 7, 5, 7, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[19, 7,20, 7, 7, 7, 6, 7,29, 7, 7, 6,29, 7, 7, 7, 7, 7, 7,19,20,35,36, 7,19,20,35,36, 7,29, 7, 7,29, 7, 7,35,36,35,36,],
[ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[21, 7,22, 7, 7, 7, 7, 7, 8, 7, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 9, 1, 7, 7, 7,23, 1, 7,23, 9,10, 7, 7, 7, 7, 7, 7,30,31, 9,10,30,31,23, 7, 7, 7, 7, 7,23,30,31, 7, 7, 7,30,31, 7, 7,],
[ 7, 7,10, 1,23, 7, 7, 7, 7, 7, 7, 1,23, 9,10,30,31, 7, 7, 7, 7, 7, 7, 7, 9,10,30,31,23, 7, 7, 7,23,30,31, 7, 7,30,31,],
[24, 7,25, 7, 7, 7,11, 7,32, 7, 7,11,32, 7, 7, 7, 7, 7, 7,24,25,37,38, 7,24,25,37,38, 7,32, 7, 7,32, 7, 7,37,38,37,38,],
[26, 7,27, 7, 7, 7, 7, 7,12, 7, 7, 7,12, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[13, 3, 7, 7, 7,28, 3, 7,28,13,14, 7, 7, 7, 7, 7, 7,33,34,13,14,33,34,28, 7, 7, 7, 7, 7,28,33,34, 7, 7, 7,33,34, 7, 7,],
[ 7, 7,14, 3,28, 7, 7, 7, 7, 7, 7, 3,28,13,14,33,34, 7, 7, 7, 7, 7, 7, 7,13,14,33,34,28, 7, 7, 7,28,33,34, 7, 7,33,34,],
[15, 7, 7, 7, 7, 4, 7, 7, 4, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,16, 7, 4, 7, 7, 7, 7, 7, 7, 7, 4, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[17, 7, 7, 7, 7, 5, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,18, 7, 5, 7, 7, 7, 7, 7, 7, 7, 5, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[19, 6, 7, 7, 7,29, 6, 7,29,19,20, 7, 7, 7, 7, 7, 7,35,36,19,20,35,36,29, 7, 7, 7, 7, 7,29,35,36, 7, 7, 7,35,36, 7, 7,],
[ 7, 7,20, 6,29, 7, 7, 7, 7, 7, 7, 6,29,19,20,35,36, 7, 7, 7, 7, 7, 7, 7,19,20,35,36,29, 7, 7, 7,29,35,36, 7, 7,35,36,],
[21, 7, 7, 7, 7, 8, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,22, 7, 8, 7, 7, 7, 7, 7, 7, 7, 8, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[30, 7,31, 7, 7, 7, 7, 7,23, 7, 7, 7,23, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[24,11, 7, 7, 7,32,11, 7,32,24,25, 7, 7, 7, 7, 7, 7,37,38,24,25,37,38,32, 7, 7, 7, 7, 7,32,37,38, 7, 7, 7,37,38, 7, 7,],
[ 7, 7,25,11,32, 7, 7, 7, 7, 7, 7,11,32,24,25,37,38, 7, 7, 7, 7, 7, 7, 7,24,25,37,38,32, 7, 7, 7,32,37,38, 7, 7,37,38,],
[26, 7, 7, 7, 7,12, 7, 7,12, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,27, 7,12, 7, 7, 7, 7, 7, 7, 7,12, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[33, 7,34, 7, 7, 7, 7, 7,28, 7, 7, 7,28, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[35, 7,36, 7, 7, 7, 7, 7,29, 7, 7, 7,29, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[30, 7, 7, 7, 7,23, 7, 7,23, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,31, 7,23, 7, 7, 7, 7, 7, 7, 7,23, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[37, 7,38, 7, 7, 7, 7, 7,32, 7, 7, 7,32, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[33, 7, 7, 7, 7,28, 7, 7,28, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,34, 7,28, 7, 7, 7, 7, 7, 7, 7,28, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[35, 7, 7, 7, 7,29, 7, 7,29, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,36, 7,29, 7, 7, 7, 7, 7, 7, 7,29, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[37, 7, 7, 7, 7,32, 7, 7,32, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
[ 7, 7,38, 7,32, 7, 7, 7, 7, 7, 7, 7,32, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,],
]
M = range(39)
idempotents = [0,2,6,7,8,9,11,12,14,16,17,19,21,25,27]
omega = [0,7,2,7,7,7,6,7,8,9,7,11,12,7,14,7,16,17,7,19,7,21,7,7,7,25,7,27,7,7,7,7,7,7,7,7,7,7,7]
def check():
for e in idempotents:
for x in M:
ex = T[e][x]
for s in M:
es = T[e][s]
for f in idempotents:
exf = T[ex][f]
esf = T[es][f]
for y in M:
exfy = omega[T[exf][y]]
for t in M:
tesf = omega[T[t][esf]]
if T[T[exfy][exf]][tesf] != T[T[exfy][esf]][tesf]:
return 0
return 1
sys.exit(check())
c++代码(需要c++ 11,因为新的数组初始化和迭代语法):
// File: timing.cc
// Compile via 'g++ -std=c++11 -O2 timing.cc'
// Run via 'time ./a.out'
#include <vector>
#include <cstddef>
int main(int, char **) {
const size_t N = 39;
typedef unsigned element_t;
const std::vector<std::vector<element_t>> T{{
{{ 0, 6, 7, 7, 7, 8, 6, 7, 8,19,20, 7, 7, 7, 7, 7, 7,21,22,19,20,21,22,29, 7, 7, 7, 7, 7,29,35,36, 7, 7, 7,35,36, 7, 7,}},
{{ 9, 7,10, 7, 7, 7, 1, 7,23, 7, 7, 1,23, 7, 7, 7, 7, 7, 7, 9,10,30,31, 7, 9,10,30,31, 7,23, 7, 7,23, 7, 7,30,31,30,31,}},
{{ 7, 7, 2,11,12, 7, 7, 7, 7, 7, 7,11,12,24,25,26,27, 7, 7, 7, 7, 7, 7, 7,24,25,26,27,32, 7, 7, 7,32,37,38, 7, 7,37,38,}},
{{13, 7,14, 7, 7, 7, 3, 7,28, 7, 7, 3,28, 7, 7, 7, 7, 7, 7,13,14,33,34, 7,13,14,33,34, 7,28, 7, 7,28, 7, 7,33,34,33,34,}},
{{15, 7,16, 7, 7, 7, 7, 7, 4, 7, 7, 7, 4, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{17, 7,18, 7, 7, 7, 7, 7, 5, 7, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{19, 7,20, 7, 7, 7, 6, 7,29, 7, 7, 6,29, 7, 7, 7, 7, 7, 7,19,20,35,36, 7,19,20,35,36, 7,29, 7, 7,29, 7, 7,35,36,35,36,}},
{{ 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{21, 7,22, 7, 7, 7, 7, 7, 8, 7, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 9, 1, 7, 7, 7,23, 1, 7,23, 9,10, 7, 7, 7, 7, 7, 7,30,31, 9,10,30,31,23, 7, 7, 7, 7, 7,23,30,31, 7, 7, 7,30,31, 7, 7,}},
{{ 7, 7,10, 1,23, 7, 7, 7, 7, 7, 7, 1,23, 9,10,30,31, 7, 7, 7, 7, 7, 7, 7, 9,10,30,31,23, 7, 7, 7,23,30,31, 7, 7,30,31,}},
{{24, 7,25, 7, 7, 7,11, 7,32, 7, 7,11,32, 7, 7, 7, 7, 7, 7,24,25,37,38, 7,24,25,37,38, 7,32, 7, 7,32, 7, 7,37,38,37,38,}},
{{26, 7,27, 7, 7, 7, 7, 7,12, 7, 7, 7,12, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{13, 3, 7, 7, 7,28, 3, 7,28,13,14, 7, 7, 7, 7, 7, 7,33,34,13,14,33,34,28, 7, 7, 7, 7, 7,28,33,34, 7, 7, 7,33,34, 7, 7,}},
{{ 7, 7,14, 3,28, 7, 7, 7, 7, 7, 7, 3,28,13,14,33,34, 7, 7, 7, 7, 7, 7, 7,13,14,33,34,28, 7, 7, 7,28,33,34, 7, 7,33,34,}},
{{15, 7, 7, 7, 7, 4, 7, 7, 4, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,16, 7, 4, 7, 7, 7, 7, 7, 7, 7, 4, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7,15,16, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{17, 7, 7, 7, 7, 5, 7, 7, 5, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,18, 7, 5, 7, 7, 7, 7, 7, 7, 7, 5, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7,17,18, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{19, 6, 7, 7, 7,29, 6, 7,29,19,20, 7, 7, 7, 7, 7, 7,35,36,19,20,35,36,29, 7, 7, 7, 7, 7,29,35,36, 7, 7, 7,35,36, 7, 7,}},
{{ 7, 7,20, 6,29, 7, 7, 7, 7, 7, 7, 6,29,19,20,35,36, 7, 7, 7, 7, 7, 7, 7,19,20,35,36,29, 7, 7, 7,29,35,36, 7, 7,35,36,}},
{{21, 7, 7, 7, 7, 8, 7, 7, 8, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,22, 7, 8, 7, 7, 7, 7, 7, 7, 7, 8, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7,21,22, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{30, 7,31, 7, 7, 7, 7, 7,23, 7, 7, 7,23, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{24,11, 7, 7, 7,32,11, 7,32,24,25, 7, 7, 7, 7, 7, 7,37,38,24,25,37,38,32, 7, 7, 7, 7, 7,32,37,38, 7, 7, 7,37,38, 7, 7,}},
{{ 7, 7,25,11,32, 7, 7, 7, 7, 7, 7,11,32,24,25,37,38, 7, 7, 7, 7, 7, 7, 7,24,25,37,38,32, 7, 7, 7,32,37,38, 7, 7,37,38,}},
{{26, 7, 7, 7, 7,12, 7, 7,12, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,27, 7,12, 7, 7, 7, 7, 7, 7, 7,12, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7,26,27, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{33, 7,34, 7, 7, 7, 7, 7,28, 7, 7, 7,28, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{35, 7,36, 7, 7, 7, 7, 7,29, 7, 7, 7,29, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{30, 7, 7, 7, 7,23, 7, 7,23, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,31, 7,23, 7, 7, 7, 7, 7, 7, 7,23, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7,30,31, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{37, 7,38, 7, 7, 7, 7, 7,32, 7, 7, 7,32, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{33, 7, 7, 7, 7,28, 7, 7,28, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,34, 7,28, 7, 7, 7, 7, 7, 7, 7,28, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7,33,34, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{35, 7, 7, 7, 7,29, 7, 7,29, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,36, 7,29, 7, 7, 7, 7, 7, 7, 7,29, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7,35,36, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{37, 7, 7, 7, 7,32, 7, 7,32, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
{{ 7, 7,38, 7,32, 7, 7, 7, 7, 7, 7, 7,32, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7,37,38, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,}},
}};
const std::vector<element_t> idempotents{{0,2,6,7,8,9,11,12,14,16,17,19,21,25,27}};
const std::vector<element_t> omega{{0,7,2,7,7,7,6,7,8,9,7,11,12,7,14,7,16,17,7,19,7,21,7,7,7,25,7,27,7,7,7,7,7,7,7,7,7,7,7}};
element_t ex, es, exf, esf, exfy, tesf;
for(auto e: idempotents) {
for(size_t x = 0; x < N; ++x) {
ex = T[e][x];
for(size_t s = 0; s < N; ++s) {
es = T[e][s];
for(auto f: idempotents) {
exf = T[ex][f];
esf = T[es][f];
for(size_t y = 0; y < N; ++y) {
exfy = omega[T[exf][y]];
for(size_t t = 0; t < N; ++t) {
tesf = omega[T[t][esf]];
if(T[T[exfy][exf]][tesf] != T[T[exfy][esf]][tesf])
return 0;
}}}}}}
return 1;
}
(不要询问代码的详细功能。粗略地说,实现了一个代数形式语言理论背景下的决策过程;该代码验证了由乘法表T
给出的单oid的身份。特别是,代码不是一个人为的示例,而是一个真实的应用程序。当然,人们可以在形式语言理论的背景下讨论"应用"。
-
time python timing.py
输出2m18.389s
-
g++ -std=c++11 -O2 timing.cc && time ./a.out
输出0m1.064s
0m9.329s
time pypy timing.py
输出编辑
- 为了更公平的比较,我做了一些
g++
似乎自动合并的优化。我重新排序了循环,并将变量赋值尽可能地向外移动(正如评论所建议的那样)。这使得pypy
的加速因子约为2.5,python
的加速因子约为2。 同样为了公平起见,我使用了动态大小的 - 我删除了关于为什么numpy较慢(注释表明我没有正确使用它)以及为什么在脚本的主要部分执行循环较慢的旁白(众所周知,Python使用局部变量比使用全局变量更快;
- 我知道c++和Python有不同的作用域。我也知道c++是编译的,而Python通常是解释的(这就是为什么我使用
pypy
)。我想知道为什么pypy
在这段特定代码上如此慢的最终技术原因。(使用numpy和numba可能能够获得接近本机的性能,但这不是我的问题的精神,因为它实际上将所有的计算转移回C代码。)我相应地澄清了我的问题。
std::vector
而不是恒定大小的std::array
。numpy和python循环的简短回答
如果您正确使用numpy,则将所有内容再次推回到c级。
长回答
怎么能这样呢?
我将在这里展示一点我的评论的意思,以及如何使用numpy避免不必要的循环。让我们看一下原始代码,假设您已经将所有列表放入np中。np.asarray(list).
for e in idempotents:
for x in M:
ex = T[e][x]
直接翻译为:
T[idempotents]
为什么?
Numpy数组可以使用索引数组进行索引。如。T[0]
返回矩阵t的所有列(实际上是所有以下维度),因此T[0]==T[0,:]
用于2d数组。由于循环遍历所有幂等幂函数作为下标,然后遍历T[e][x]
列中的所有元素,因此T[idempotents]
与这两个循环相同。
详细信息见这里。
, es
下一个是
for e in idempotents:
for x in M:
ex = T[e][x]
for s in M:
es = T[e][s]
由于没有必要重新执行整个循环,因此转换为
es=ex
因为我们使用的是python,所以矩阵es甚至没有被复制,只是被引用。
exf,养
我现在跳过代码片段中的一些for循环。
for f in idempotents:
exf = T[ex][f]
esf = T[es][f]
现在你再次用幂等向量访问最外层的索引。所以我们可以用同样的方法对numpy做同样的事情:
T[T[idempotents]]
print T[T[idempotents]].shape
>> (15, 39, 39)
现在,我们有一个维度为(15,39,39)的数组,因为对于二维数组T[幂等幂]的每个元素,你返回T的元素,这基本上是第三个循环。
exf = T[T[idempotents]]
esf = exf
从这里开始变得更复杂,我将跳过其余部分。它将是如下行:
Ti = T[idempotents] # T[e][x] == T[e][s] by loop definition
TTi = T[Ti] # T[T[e][x]]
TTi.shape = -1 , 39 # bring first index back into shape
exf = TTi[:, idempotents] # T[T[e][x]][f]
esf = exf # T[T[e][s]][f] == T[T[e][x]][f] by loop definition
Texf = T[exf].ravel()
exfy = omega[Texf]
TTexf = T.T[exf].ravel() # tesf = omega[T[t][esf]] # since I cannot index fast along t I use the transpose of T
tesf = omega[TTexf]
因为Python在运行时解释它的输入,而c++编译器能够执行主要的优化,特别是对于固定大小数组的for循环。
有些编译器甚至会计算嵌套for循环的整个输出。
- 删除一个线程上有数百万个字符串的大型哈希映射会影响另一个线程的性能
- OpenMP阵列性能较差
- 递归列出所有目录中的C++与Python与Ruby的性能
- 大小相等但成员数量不同的结构之间的性能差异
- 为什么constexpr的性能比正常表达式差
- 在类中使用随机生成器时出现性能问题
- 在main()之外初始化std::vector会导致性能下降(多线程)
- 海湾合作委员会 ARM 性能下降
- GCC 和 Clang 代码性能的巨大差异
- 在容量内调整矢量大小时的性能影响
- 了解算法的性能差异(如果以不同的编程语言实现)
- 未达到的情况会影响开关外壳性能
- QStringList vs list<shared_ptr<QString>> 性能比较C++
- 是否总是可以将使用递归编写的程序重写为不使用递归的程序C++,性能观点是什么?
- 哪种方法更好,性能明智
- C++ 特征库:引用的性能开销<>
- 与多个 for 循环与单个 for 循环 wrt 相关的性能从多映射获取数据
- 基于范围的 for 循环range_declaration中各种说明符之间的性能差异
- std::p mr::memory_resource 如何与 std::container 产生性能差异?
- Python在简单数组查找上的小循环的性能问题