通过引用将自定义结构的向量传递给 boost::compute 闭包或函数

Pass a vector of custom structs by reference to a boost::compute closure or function

本文关键字：boost compute 闭包函数引用自定义结构向量更新时间：2023-10-16

我对opencl有点陌生，正在尝试学习正确使用boost：：compute。请考虑以下代码：

#include <iostream>
#include <vector>
#include <boost/compute.hpp>
const cl_int cell_U_size{ 4 };
#pragma pack (push,1)
struct Cell
{
    cl_double U[cell_U_size];
};
#pragma pack (pop)
BOOST_COMPUTE_ADAPT_STRUCT(Cell, Cell, (U));
int main(int argc, char* argv[])
{
    using namespace boost;
    auto device = compute::system::default_device();
    auto context = compute::context(device);
    auto queue = compute::command_queue(context, device);
    std::vector<Cell> host_Cells;
    host_Cells.reserve(10);
    for (auto j = 0; j < host_Cells.capacity(); ++j) {
        host_Cells.emplace_back(Cell());
        for (auto i = 0; i < cell_U_size; ++i) {
            host_Cells.back().U[i] = static_cast<cl_double>(i+j);
        }
    }
    std::cout << "Before:n";
    for (auto const& hc : host_Cells) {
        for (auto const& u : hc.U)
            std::cout << " " << u;
        std::cout << "n";
    }
    compute::vector<Cell> device_Cells(host_Cells.size(), context);
    auto f = compute::copy_async(host_Cells.begin(), host_Cells.end(), device_Cells.begin(), queue);
    try {
        BOOST_COMPUTE_CLOSURE(Cell, Step1, (Cell cell), (cell_U_size), {
            for (int i = 0; i < cell_U_size; ++i) {
                cell.U[i] += 1.0;
            }
            return cell;
        });
        f.wait(); // Wait for data to finish being copied
        compute::transform(device_Cells.begin(), device_Cells.end(), device_Cells.begin(), Step1, queue);
        //BOOST_COMPUTE_CLOSURE(void, Step2, (Cell &cell), (cell_U_size), {
        //  for (int i = 0; i < cell_U_size; ++i) {
        //      cell.U[i] += 1.0;
        //  }
        //});
        //compute::for_each(device_Cells.begin(), device_Cells.end(), Step2, queue);
        compute::copy(device_Cells.begin(), device_Cells.end(), host_Cells.begin(), queue);
    }
    catch (std::exception &e) {
        std::cout << e.what() << std::endl;
        throw;
    }
    std::cout << "After:n";
    for (auto const& hc : host_Cells) {
        for (auto const& u : hc.U)
            std::cout << " " << u;
        std::cout << "n";
    }
}

我有一个自定义结构的向量(实际上比这里显示的要复杂得多(，我想在 GPU 上处理。在未注释的BOOST_COMPUTE_CLOSURE中，compute::transform按值传递结构，处理它们，然后将它们复制回来。

我想通过引用传递这些，如注释掉的BOOST_COMPUTE_CLOSURE所示，compute::for_each，但是当程序运行时内核无法编译(Build Program Failure(，我没有找到任何文档提到应该如何实现这一点。

我知道我可以通过使用 BOOST_COMPUTE_STRINGIZE_SOURCE 并将指针传递给整个结构向量来实现引用传递(实际上是指针，因为它是 C99(，但我想使用 compute::... 函数，因为它们看起来更优雅。

如果定义宏BOOST_COMPUTE_DEBUG_KERNEL_COMPILATION并且构建 OpenCL 程序失败，则程序源代码和构建日志将写入 stdout。

您不能在 OpenCL C 中通过引用传递，您尝试在 BOOST_COMPUTE_CLOSURE 中执行此操作。我知道您希望将__global指针传递给您的闭包并修改全局内存中变量的值，而不是该值的本地副本。我不认为它在Boost.Compute中得到支持，因为在for_each(和其他算法(中，Boost.Compute总是将值传递给你的函数/闭包。

当然，您始终可以实现解决方法 - 添加一元运算符&或实现自定义设备迭代器。但是，在所呈现的示例中，它只会降低性能，因为它会导致非合并的内存读取和写入。如果你有非常复杂的结构数组(AoS(，试着改变数组的结构(SoA(或/和破坏你的结构。