PyArray_Check给出了 Cython/C++ 的分段错误

PyArray_Check gives Segmentation Fault with Cython/C++

本文关键字：C++ 分段错误 Cython Check PyArray 更新时间：2023-10-16

提前谢谢大家。

我想知道#include所有 numpy 标头的正确方法是什么，以及使用 Cython 和 C++ 解析 numpy 数组的正确方法是什么。以下是尝试：

// cpp_parser.h 
#ifndef _FUNC_H_
#define _FUNC_H_
#include <Python.h>
#include <numpy/arrayobject.h>
void parse_ndarray(PyObject *);
#endif

我知道这可能是错误的，我也尝试了其他选项，但没有一个有效。

// cpp_parser.cpp
#include "cpp_parser.h"
#include <iostream>
using namespace std;
void parse_ndarray(PyObject *obj) {
if (PyArray_Check(obj)) { // this throws seg fault
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}

PyArray_Check例程引发分段错误。PyArray_CheckExact不会扔，但这不是我想要的。

# parser.pxd
cdef extern from "cpp_parser.h": 
cdef void parse_ndarray(object)

实现文件为：

# parser.pyx
import numpy as np
cimport numpy as np
def py_parse_array(object x):
assert isinstance(x, np.ndarray)
parse_ndarray(x)

setup.py脚本是

# setup.py
from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy as np
ext = Extension(
name='parser',
sources=['parser.pyx', 'cpp_parser.cpp'],
language='c++',
include_dirs=[np.get_include()],
extra_compile_args=['-fPIC'],
)
setup(
name='parser',
ext_modules=cythonize([ext])
)

最后是测试脚本：

# run_test.py
import numpy as np
from parser import py_parse_array
x = np.arange(10)
py_parse_array(x)

我已经使用上述所有脚本创建了一个 git 存储库：https://github.com/giantwhale/study_cython_numpy/

快速修复(请继续阅读以获取更多详细信息和更复杂的方法)：

您需要通过调用import_array()来初始化每个使用 numpy-stuff 的 cpp 文件中的变量PyArray_API：

//it is only a trick to ensure import_array() is called, when *.so is loaded
//just called only once
int init_numpy(){
import_array(); // PyError if not successful
return 0;
}
const static int numpy_initialized =  init_numpy();
void parse_ndarraray(PyObject *obj) { // would be called every time
if (PyArray_Check(obj)) {
cout << "PyArray_Check Passed" << endl;
} else {
cout << "PyArray_Check Failed" << endl;
}
}

还可以使用_import_array，如果不成功，则返回负数，以使用自定义错误处理。有关import_array的定义，请参阅此处。

警告：正如@isra60所指出的，_import_array()/import_array()只能在Python初始化后调用，即在调用Py_Initialize()之后。扩展总是如此，但如果嵌入了 python 解释器，则情况并非总是如此，因为numpy_initialized是在main-start 之前初始化的。在这种情况下，不应使用"初始化技巧"，init_numpy()Py_Initialize()之后调用。

先进的解决方案：

注意：有关为什么需要设置PyArray_API的信息，请参阅此 SO-answer：为了能够将符号的解析推迟到运行时，因此 numpy 的共享对象在链接时不需要，并且不得在动态库路径上(python 的系统路径就足够了)。

提出的解决方案很快，但如果有多个使用 numpy 的 cpp，则初始化了很多PyArray_API实例。

如果PyArray_API未定义为静态，而是在除一个翻译单元之外的所有翻译单元中extern，则可以避免这种情况。对于这些翻译单元NO_IMPORT_ARRAY必须先定义宏，然后才能包含宏numpy/arrayobject.h。

然而，我们需要一个定义这个符号的翻译单元。对于此转换单元，不得定义宏NO_IMPORT_ARRAY。

但是，如果不定义宏PY_ARRAY_UNIQUE_SYMBOL我们将只得到一个静态符号，即对其他翻译单元不可见，因此链接器将失败。原因是：如果有两个库并且每个人都定义了一个PyArray_API那么我们将有一个符号的多重定义，并且链接器将失败，即我们不能同时使用这两个库。

因此，通过在每次包含numpy/arrayobject.h之前将PY_ARRAY_UNIQUE_SYMBOL定义为MY_FANCY_LIB_PyArray_API，我们将拥有自己的PyArray_API名称，该名称不会与其他库冲突。

把所有的东西放在一起：

答：use_numpy.h - 包含 numpy 功能的标头，即numpy/arrayobject.h

//use_numpy.h
//your fancy name for the dedicated PyArray_API-symbol
#define PY_ARRAY_UNIQUE_SYMBOL MY_PyArray_API 
//this macro must be defined for the translation unit              
#ifndef INIT_NUMPY_ARRAY_CPP 
#define NO_IMPORT_ARRAY //for usual translation units
#endif
//now, everything is setup, just include the numpy-arrays:
#include <numpy/arrayobject.h>

B：init_numpy_api.cpp- 用于初始化全局MY_PyArray_API的翻译单元：

//init_numpy_api.cpp
//first make clear, here we initialize the MY_PyArray_API
#define INIT_NUMPY_ARRAY_CPP
//now include the arrayobject.h, which defines
//void **MyPyArray_API
#inlcude "use_numpy.h"
//now the old trick with initialization:
int init_numpy(){
import_array();// PyError if not successful
return 0;
}
const static int numpy_initialized =  init_numpy();

C：只要你需要numpy就包括use_numpy.h，它会定义extern void **MyPyArray_API：

//example
#include "use_numpy.h"
...
PyArray_Check(obj); // works, no segmentation error

警告：不应忘记，要使初始化技巧正常工作，必须已经调用Py_Initialize()。

你为什么需要它(由于历史原因保留)：

当我使用调试符号构建扩展时：

extra_compile_args=['-fPIC', '-O0', '-g'],
extra_link_args=['-O0', '-g'],

并使用 GDB 运行它：

gdb --args python run_test.py
(gdb) run
--- Segmentation fault
(gdb) disass

我可以看到以下内容：

0x00007ffff1d2a6d9 <+20>:    mov    0x203260(%rip),%rax       
# 0x7ffff1f2d940 <_ZL11PyArray_API>
0x00007ffff1d2a6e0 <+27>:    add    $0x10,%rax
=> 0x00007ffff1d2a6e4 <+31>:    mov    (%rax),%rax
...
(gdb) print $rax
$1 = 16

我们应该记住，PyArray_Check只是以下方面的定义：

#define PyArray_Check(op) PyObject_TypeCheck(op, &PyArray_Type)

似乎，&PyArray_Type以某种方式使用了未初始化的PyArray_API的一部分(具有值0)。

让我们看一下预处理器之后的cpp_parser.cpp(使用标志-E编译)：

static void **PyArray_API= __null
...
static int
_import_array(void)
{
PyArray_API = (void **)PyCapsule_GetPointer(c_api,...

所以PyArray_AP我是静态的并且通过_import_array(void)初始化，这实际上可以解释我在构建过程中收到的警告，_import_array()已定义但未使用 - 我们没有初始化PyArray_API。

因为PyArray_API是一个静态变量，所以它必须在每个编译单元中初始化.cpp即 - 文件。

所以我们只需要这样做——import_array()似乎是官方的方式。

自从您使用Cython以来，numpy API已经包含在Cython Include中。这在 jupyter 笔记本中很简单。

cimport numpy as np
from numpy cimport PyArray_Check
np.import_array()  # Attention!
def parse_ndarray(object ndarr):
if PyArray_Check(ndarr):
print("PyArray_Check Passed")
else:
print("PyArray_Check Failed")

我相信np.import_array()在这里是一个关键，因为您调用了 numpy API。评论并尝试，也会出现崩溃。

import numpy as np
from array import array
ndarr = np.arange(3)
pyarr = array('i', range(3))
parse_ndarray(ndarr)
parse_ndarray(pyarr)
parse_ndarray("Trick or treat!")

输出：

PyArray_Check Passed
PyArray_Check Failed
PyArray_Check Failed