使用 boost::p ython vector_indexing_suite 包装 std::vector

Wrapping an std::vector using boost::python vector_indexing_suite

本文关键字：vector suite 包装 std indexing boost ython 使用更新时间：2023-10-16

我正在开发一个C++库，其中包含Python绑定（使用boost：:p ython）表示存储在文件中的数据。我的大多数半技术用户将使用Python与之交互，所以我需要使其尽可能Pythonic。但是，我也会有C++程序员使用 API，所以我不想在C++端妥协以适应 Python 绑定。

库的很大一部分将由容器制成。为了使python用户直观，我希望它们的行为像python列表一样，即：

# an example compound class
class Foo:
    def __init__( self, _val ):
        self.val = _val
# add it to a list
foo = Foo(0.0)
vect = []
vect.append(foo)
# change the value of the *original* instance
foo.val = 666.0
# which also changes the instance inside the container
print vect[0].val # outputs 666.0

测试设置

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
#include <boost/python/register_ptr_to_python.hpp>
#include <boost/shared_ptr.hpp>
struct Foo {
    double val;
    Foo(double a) : val(a) {}
    bool operator == (const Foo& f) const { return val == f.val; }
};
/* insert the test module wrapping code here */
int main() {
    Py_Initialize();
    inittest();
    boost::python::object globals = boost::python::import("__main__").attr("__dict__");
    boost::python::exec(
        "import testn"
        "foo = test.Foo(0.0)n"         // make a new Foo instance
        "vect = test.FooVector()n"     // make a new vector of Foos
        "vect.append(foo)n"            // add the instance to the vector
        "foo.val = 666.0n"             // assign a new value to the instance
                                        //   which should change the value in vector
        "print 'Foo =', foo.valn"      // and print the results
        "print 'vector[0] =', vect[0].valn",
        globals, globals
    );
    return 0;
}

`shared_ptr`的方式

使用

shared_ptr，我可以得到与上面相同的行为，但这也意味着我必须使用共享指针来表示C++中的所有数据，从许多角度来看，这并不好。

BOOST_PYTHON_MODULE( test ) {
    // wrap Foo
    boost::python::class_< Foo, boost::shared_ptr<Foo> >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);
    // wrap vector of shared_ptr Foos
    boost::python::class_< std::vector < boost::shared_ptr<Foo> > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< boost::shared_ptr<Foo> >, true >());
}

在我的测试设置中，这会产生与纯 Python 相同的输出：

Foo = 666.0
vector[0] = 666.0

`vector<Foo>`的方式

直接使用矢量可以在C++侧提供漂亮的干净设置。但是，结果的行为方式与纯 Python 不同。

BOOST_PYTHON_MODULE( test ) {
    // wrap Foo
    boost::python::class_< Foo >("Foo", boost::python::init<double>())
        .def_readwrite("val", &Foo::val);
    // wrap vector of Foos
    boost::python::class_< std::vector < Foo > >("FooVector")
        .def(boost::python::vector_indexing_suite<std::vector< Foo > >());
}

这会产生：

Foo = 666.0
vector[0] = 0.0

这是"错误的" - 更改原始实例不会更改容器内的值。

我希望我不要太多

有趣的是，无论我使用两种封装中的哪一种，这段代码都有效：

footwo = vect[0]
footwo.val = 555.0
print vect[0].val

这意味着boost：:p ython能够处理"虚假共享所有权"（通过其by_proxy返回机制）。有没有办法在插入新元素时实现相同的效果？

但是，如果答案是否定的，我很想听听其他建议 - Python 工具包中是否有一个示例，其中实现了类似的集合封装，但它的行为不像 python 列表？

非常感谢您阅读本文:)

由于语言之间的语义差异，当涉及集合时，通常很难将单个可重用解决方案应用于所有方案。最大的问题是，虽然 Python 集合直接支持引用，但C++集合需要一定程度的间接性，例如具有shared_ptr元素类型。如果没有这种间接寻址，C++集合将无法支持与 Python 集合相同的功能。例如，考虑引用同一对象的两个索引：

s = Spam()
spams = []
spams.append(s)
spams.append(s)

如果没有类似指针的元素类型，C++集合不可能有两个引用同一对象的索引。然而，根据使用情况和需求，可能会有一些选项允许Python用户使用Pythonic式接口，同时仍然保持C++的单个实现。

最 Python 的解决方案是使用自定义转换器，将 Python 可迭代对象转换为C++集合。有关实现详细信息，请参阅此答案。在以下情况下，请考虑此选项：
- 该系列的元素复制起来很便宜。
- C++函数仅对右值类型（即 std::vector<> 或 const std::vector<>& ）进行操作。此限制可防止C++对 Python 集合或其元素进行更改。
增强vector_indexing_suite功能，重用尽可能多的功能，例如用于安全处理基础集合的索引删除和重新分配的代理：
- 使用自定义HeldType公开模型，该充当智能指针并委托给实例或从 vector_indexing_suite 返回的元素代理对象。
- Monkey 修补将元素插入集合的集合方法，以便将自定义HeldType设置为委托给元素代理。

向 Boost.Python

公开类时，HeldType是嵌入在 Boost.Python 对象中的对象类型。当访问包装的类型对象时，Boost.Python 会为HeldType调用get_pointer()。下面的 object_holder 类提供了将句柄返回到它拥有的实例或元素代理的功能：

/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder
{
public:
  typedef T element_type;
  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  {}
  element_type* get() const
  {
    if (!object_.is_none())
    {
      return boost::python::extract<element_type*>(object_)();
    }
    return ptr_ ? ptr_.get() : NULL;
  }
  void reset(boost::python::object object)
  {
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;
    object_ = object;
    ptr_.reset();
  }
private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
};
/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)
{
  return holder.get();
}

在支持间接寻址的情况下，剩下的唯一事情就是修补集合以设置object_holder。支持这一点的一种干净且可重用的方法是使用 def_visitor . 这是一个通用接口，允许以非侵入方式扩展class_对象。例如，vector_indexing_suite使用此功能。

monkey 下面的 custom_vector_indexing_suite 类将 append() 方法修补以委托给原始方法，然后使用新设置元素的代理调用object_holder.reset()。这会导致object_holder引用集合中包含的元素。

/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>
{
private:
  friend class boost::python::def_visitor_access;
  template <typename ClassT>
  void visit(ClassT& cls) const
  {
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());
    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  }
  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  {
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        {
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        },
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  }
};

包装需要在运行时发生，并且自定义函子对象不能通过def()直接在类上定义，因此必须使用 make_function() 函数。对于函子，它需要调用策略和表示签名的 MPL 前端可扩展序列。

下面是一个完整的示例，演示如何使用object_holder委托给代理，并custom_vector_indexing_suite修补集合。

#include <boost/python.hpp>
#include <boost/python/suite/indexing/vector_indexing_suite.hpp>
/// @brief Mockup type.
struct spam
{
  int val;
  spam(int val) : val(val) {}
  bool operator==(const spam& rhs) { return val == rhs.val; }
};
/// @brief Mockup function that operations on a collection of spam instances.
void modify_spams(std::vector<spam>& spams)
{
  for (auto& spam : spams)
    spam.val *= 2;
}
/// @brief smart pointer type that will delegate to a python
///        object if one is set.
template <typename T>
class object_holder
{
public:
  typedef T element_type;
  object_holder(element_type* ptr)
    : ptr_(ptr),
      object_()
  {}
  element_type* get() const
  {
    if (!object_.is_none())
    {
      return boost::python::extract<element_type*>(object_)();
    }
    return ptr_ ? ptr_.get() : NULL;
  }
  void reset(boost::python::object object)
  {
    // Verify the object holds the expected element.
    boost::python::extract<element_type*> extractor(object_);
    if (!extractor.check()) return;
    object_ = object;
    ptr_.reset();
  }
private:
  boost::shared_ptr<element_type> ptr_;
  boost::python::object object_;
};
/// @brief Helper function used to extract the pointed to object from
///        an object_holder.  Boost.Python will use this through ADL.
template <typename T>
T* get_pointer(const object_holder<T>& holder)
{
  return holder.get();
}
/// @brief Indexing suite that will resets the element's HeldType to
///        that of the proxy during element insertion.
template <typename Container,
          typename HeldType>
class custom_vector_indexing_suite
  : public boost::python::def_visitor<
      custom_vector_indexing_suite<Container, HeldType>>
{
private:
  friend class boost::python::def_visitor_access;
  template <typename ClassT>
  void visit(ClassT& cls) const
  {
    // Define vector indexing support.
    cls.def(boost::python::vector_indexing_suite<Container>());
    // Monkey patch element setters with custom functions that
    // delegate to the original implementation then obtain a 
    // handle to the proxy.
    cls
      .def("append", make_append_wrapper(cls.attr("append")))
      // repeat for __setitem__ (slice and non-slice) and extend
      ;
  }
  /// @brief Returned a patched 'append' function.
  static boost::python::object make_append_wrapper(
    boost::python::object original_fn)
  {
    namespace python = boost::python;
    return python::make_function([original_fn](
          python::object self,
          HeldType& value)
        {
          // Copy into the collection.
          original_fn(self, value.get());
          // Reset handle to delegate to a proxy for the newly copied element.
          value.reset(self[-1]);
        },
      // Call policies.
      python::default_call_policies(),
      // Describe the signature.
      boost::mpl::vector<
        void,           // return
        python::object, // self (collection)
        HeldType>()     // value
      );
  }
  // .. make_setitem_wrapper
  // .. make_extend_wrapper
};
BOOST_PYTHON_MODULE(example)
{
  namespace python = boost::python;
  // Expose spam.  Use a custom holder to allow for transparent delegation
  // to different instances.
  python::class_<spam, object_holder<spam>>("Spam", python::init<int>())
    .def_readwrite("val", &spam::val)
    ;
  // Expose a vector of spam.
  python::class_<std::vector<spam>>("SpamVector")
    .def(custom_vector_indexing_suite<
      std::vector<spam>, object_holder<spam>>())
    ;
  python::def("modify_spams", &modify_spams);
}

交互式用法：

>>> import example
>>> spam = example.Spam(5)
>>> spams = example.SpamVector()
>>> spams.append(spam)
>>> assert(spams[0].val == 5)
>>> spam.val = 21
>>> assert(spams[0].val == 21)
>>> example.modify_spams(spams)
>>> assert(spam.val == 42)
>>> spams.append(spam)
>>> spam.val = 100
>>> assert(spams[1].val == 100)
>>> assert(spams[0].val == 42) # The container does not provide indirection.

由于vector_indexing_suite仍在使用，因此应仅使用 Python 对象的 API 修改底层C++容器。例如，在容器上调用push_back可能会导致重新分配底层内存，并导致现有 Boost.Python 代理出现问题。另一方面，可以安全地修改元素本身，例如通过上面的modify_spams()函数完成的。

不幸的是，答案是否定的，你不能做你想做的事。在python中，一切都是指针，列表是指针的容器。共享指针的C++向量之所以有效，是因为底层数据结构或多或少等同于 python 列表。您请求的是让分配内存的C++向量像指针向量一样运行，这是无法做到的。

让我们看看 python 列表中发生了什么，使用C++等效的伪代码：

foo = Foo(0.0)     # Foo* foo = new Foo(0.0)
vect = []          # std::vector<Foo*> vect
vect.append(foo)   # vect.push_back(foo)

此时，foo 和 vect[0] 都指向相同的分配内存，因此更改*foo更改*vect[0] 。

现在有了vector<Foo>版本：

foo = Foo(0.0)      # Foo* foo = new Foo(0.0)
vect = FooVector()  # std::vector<Foo> vect
vect.append(foo)    # vect.push_back(*foo)

在这里，vect[0]有自己的分配内存，并且是 *foo 的副本。从根本上说，你不能让 vect[0] 与 *foo 成为相同的内存。

附带说明一下，在使用std::vector<Foo>时要小心footwo的生命周期管理：

footwo = vect[0]    # Foo* footwo = &vect[0]

后续追加可能需要移动为向量分配的存储，并且可能使footwo无效（&vect[0] 可能会更改）。