OpenCV 函数在由C++线程程序调用的 Python 脚本本身调用时锁定

OpenCV functions lock when called by a Python script itself called by a C++ threaded program

本文关键字：调用脚本 Python 锁定 OpenCV C++ 线程程序函数更新时间：2023-10-16

我有一个C++应用程序从多个线程调用Python函数。一切正常，直到我尝试从Python中使用OpenCV函数：

如果在初始化解释器的同一线程中调用，则工作正常
如果在任何其他C++线程中调用，它将永远锁定，等待释放互斥锁

基本上我有两个文件：

script.py：

import cv2
def foo():
print('foo_in')
cv2.imread('sample.jpg')
print('foo_out')

主.cpp：

#include <pthread.h>
#include <pybind11/embed.h>
pybind11::handle g_main;
void* foo(void*)
{
g_main.attr("foo")();
}
int main()
{
pybind11::scoped_interpreter guard;
pybind11::eval_file("script.py");
g_main = pybind11::module::import("__main__");
foo(nullptr);
pthread_t thread;
pthread_create(&thread, nullptr, &foo, nullptr);
pthread_join(thread, nullptr);
return 0;
}

当我执行C++片段时，我得到：

foo_in
foo_out
foo_in

。然后它永远卡住了。

如您所见，对cv2.imread的第一个调用返回，但第二个调用(在另一个线程中调用的调用)没有返回。

当我strace线程PID时，我得到以下行：

futex(0x7fe7e6b3e364, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 13961, {1550596187, 546432000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x7fe7e6b3e3e0, FUTEX_WAKE_PRIVATE, 1) = 0

。一次又一次地打印，这让我觉得线程正在等待互斥锁被释放。

我进一步尝试通过使用gdb的回溯来了解发生了什么：

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007fe7e667948f in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#2  0x00007fe7e6679979 in PyEval_RestoreThread () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#3  0x00007fe7e669968b in PyGILState_Ensure () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#4  0x00007fe7e3fa7635 in PyEnsureGIL::PyEnsureGIL (this=<synthetic pointer>) at <opencv>/modules/python/src2/cv2.cpp:83
#5  NumpyAllocator::deallocate (this=<optimized out>, u=0x7fe7a80008c0) at <opencv>/modules/python/src2/cv2.cpp:208
#6  0x00007fe7d88e17c2 in cv::MatAllocator::unmap (this=<optimized out>, u=<optimized out>) at <opencv>/modules/core/src/matrix.cpp:18
#7  0x00007fe7e3fa7dc8 in cv::Mat::release (this=0x7fe7ae8018e0) at <opencv>/modules/core/include/opencv2/core/mat.inl.hpp:808
#8  cv::Mat::~Mat (this=0x7fe7ae8018e0, __in_chrg=<optimized out>) at <opencv>/modules/core/include/opencv2/core/mat.inl.hpp:694
#9  pyopencv_from<cv::Mat> (m=...) at <opencv>/modules/python/src2/cv2.cpp:451
#10 0x00007fe7e3faa08c in pyopencv_cv_imread (args=<optimized out>, kw=<optimized out>) at <opencv>/build/modules/python_bindings_generator/pyopencv_generated_funcs.h:10588
#11 0x00007fe7e6575049 in PyCFunction_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#12 0x00007fe7e66811c5 in PyEval_EvalFrameEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#13 0x00007fe7e6711cbc in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#14 0x00007fe7e6711d93 in PyEval_EvalCodeEx () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#15 0x00007fe7e6599ac8 in ?? () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#16 0x00007fe7e664e55e in PyObject_Call () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#17 0x00007fe7e6710947 in PyEval_CallObjectWithKeywords () from /usr/lib/x86_64-linux-gnu/libpython3.5m.so.1.0
#18 0x00000000004369de in pybind11::detail::simple_collector<(pybind11::return_value_policy)1>::call (this=0x7fe7ae801e80, ptr=0x7fe7e6eaef28) at <pybind11>/pybind11/cast.h:1953
#19 0x00000000004334f3 in pybind11::detail::object_api<pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr> >::operator()<(pybind11::return_value_policy)1> (this=0x7fe7ae801ed0)
at <pybind11>/pybind11/cast.h:2108
#20 0x0000000000424336 in foo () at main.cpp:11

我尝试将 python 解释器初始化移动到foo函数中，然后它起作用了(我只需要删除对foo的第一次调用，因为解释器每个应用程序只能初始化一次)。

这让我认为cv2.imread函数只有在解释器初始化的同一线程中调用时才返回。

如果我用任何其他 OpenCV 函数替换对cv2.imread的调用，也会发生同样的情况。我在cv2.imwrite和cv2.projectPoints上对其进行了测试。

知道发生了什么以及如何在仍然能够从不同线程调用 OpenCV 函数的同时解决它吗？

所以事实证明，问题是我在没有持有 GIL(全局解释器锁)的情况下使用 Python 指令。GIL 首先由初始化解释器的线程持有，并且必须在其他线程获取它之前显式释放。

执行锁定在cv2.imread指令而不是print('foo_in')指令上的原因是 Python 解释器在从C++调用时不确保它持有 GIL(这意味着任何纯 Python 指令都以线程不安全的方式执行)。但是，引擎盖下的cv2.*指令调用的C++代码确实确保它在执行之前持有 GIL，因此锁定。

我使用显式 GIL 发布和获取解决了问题：

主.cpp

#include <pthread.h>
#include <pybind11/embed.h>
pybind11::handle g_main;
void* foo(void*)
{
pybind11::gil_scoped_acquire gil;
g_main.attr("foo")();
}
int main()
{
pybind11::scoped_interpreter guard;
pybind11::eval_file("../script.py");
g_main = pybind11::module::import("__main__");
pybind11::gil_scoped_release nogil;
foo(nullptr);
pthread_t thread;
pthread_create(&thread, nullptr, &foo, nullptr);
pthread_join(thread, nullptr);
return 0;
}

现在一切正常，我确实得到了预期的输出：

foo_in
foo_out
foo_in
foo_out