编写要在 Python 中使用的并行 C/C++ 模块的最简单方法

Simplest way to to write parallel C/C++ modules to be used in Python

本文关键字：并行模块方法最简单 C++ Python 更新时间：2023-10-16

简短的背景(不是真的必要)

我一直在努力编写一个替代(资源要求较少？)意味着C++模块转移到Scikit-Learn模块。

我一直在使用，在C++方面，nanoflann库来构建和搜索KD树。

基本上，我有两个 numpy 数组，我通过 Cython 传递给我的 C++ MeanShift 函数，然后返回找到的集群中心列表。

事实证明，它的速度要快一些，大约 7 倍(我仍在积极努力)。

我的问题：

我想并行化我的C++代码中最昂贵的部分，比如用于收敛的for循环，但是，由于这个C++模块将被导入到python中，我希望以最安全和最简单的方式这样做。

我想过使用 OpenMP，您有什么建议吗？

谢谢！有好的一天。

编辑/代码片段

谢谢@bivouac0，我现在能够编译整体。

现在我正在与逻辑/技术方面作斗争。让我给你写一个我想要并行化的代码片段。

我有一个std::vector<std::pair<size_t, double> > > matches向量和一个相当大的double samples[N]数组。我想使用存储在matches向量中的对的第一个元素来计算较大samples数组的访问索引(请参阅下面的代码)：这是执行此操作的方法：

typedef std::vector<std::pair<size_t, double> > searchResultPair; 
double* calcMean(size_t nMatches, searchResultPair matches,
double* samples) {
/*
*/
double* returnArray = new double[3];
returnArray[0] = 0;
returnArray[1] = 0;
returnArray[2] = 0;
double x = y = z = 0;
for (size_t i = 0; i < nMatches; i = i + 1) {
x = x + samples[3 * (matches[i].first)];
y = y + samples[3 * (matches[i].first) + 1];
z = z + samples[3 * (matches[i].first) + 2];
}
returnArray[0] = x/nMatches;
returnArray[1] = y/nMatches;
returnArray[2] = z/nMatches;
return(returnArray);
}

有没有办法同时访问matches[i].first变量？

我已经尝试过#pragma omp parallel for reduction(+:x,y,z) num_threads(n_threads)但它降低了性能(1 个线程> 2 个线程> 4 个线程> 8 个线程等等......

我的问题有什么意义吗？我在哪里弄错了吗？管理一个并行n_threads团队来计算部分总和可能只是一个开销x,y,z，因为vector中的元素是连续存储的......

我可以将上面的 for 循环分成 3 个部分，并尝试并行化它们中的每一个。这可能是个好主意吗？那里的计算嵌套在嵌套在另一个 for 循环中的while中，它是整个模块中最重要的方法。

关于使用 OpenMP 编译 c++ 代码的上述问题，您可能只需要包含gomp库。这是一个对我有用的简单 setup.py 脚本...

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize, build_ext
# run as... python setup.py build_ext
ext = Extension("entity_lookup",    # name of extension
["src/entity_lookup.pyx", "src/EntityLookupImpl.cpp", "src/IndexDictImpl.cpp"],
language="c++",     # this causes Pyrex/Cython to create C++ source
#include_dirs=[...],                       
libraries=['gomp'], # or include explicity with extra_link_args below                         
cmdclass = {'build_ext': build_ext},
#extra_link_args=['/usr/lib/x86_64-linux-gnu/libgomp.so.1'], # see above
extra_compile_args=['-fopenmp', '-std=c++11']
)
setup(
name = 'EntityLookup',
version = 0.4,
description = 'Package to match words and phrases to Entity labels',
ext_modules = cythonize(ext)
)

请注意包含gomp(又名libgomp.so.1)。这是定义GOMP_parallel的地方。

要编译做...python setup.py build_ext

我总是就地使用此代码(未安装在任何地方)，为此，您需要设置指向已编译entity_lookup.so 的链接，该链接显示在脚本创建的"build"目录的深处。

编写要在 Python 中使用的并行 C/C++ 模块的最简单方法

Simplest way to to write parallel C/C++ modules to be used in Python

简短的背景(不是真的必要)

我的问题 ：

编辑/代码片段

我的问题：