构建执行管道的c++库

C++ library to build up execution pipeline

本文关键字：c++ 管道执行构建更新时间：2023-10-16

我一直在寻找c++中可重用的执行管道库(作业调度器库?)。我在Boost里找不到任何东西。所以我最终找到了两个候选人:

google-concurrency-library
libpipeline

我还缺其他候选人吗?有人用过吗?它们在并行io和多线程方面有多好?这些库似乎仍然缺少依赖项处理。例如，我不太清楚如何写这样的东西:

$ cat /dev/urandom | tr P Q | head -3

在这个非常简单的例子中，流水线从下往上走，当head进程停止拉出时，第一个cat进程停止执行。

然而，我不知道如何从多线程和并行io中获益，例如:

$ cat /raid1/file1 /raid2/file2 | tr P Q > /tmp/file3

我没有办法说:当8个处理器可用时，在7个线程上执行tr。

您正在寻找的是一个数据流框架。管道是一种特殊形式的数据流，其中所有组件都有一个消费者和一个生产者。

Boost支持数据流，但不幸的是，我不熟悉Boost。链接:http://dancinghacker.com/code/dataflow/dataflow/introduction/dataflow.html

无论如何，您应该将组件编写为单独的程序并使用Unix管道。特别是，如果您的数据特征是(或可以轻松转换为)行/文本。

也可以选择编写自己的数据流。这并不难，特别是当您有限制时(我的意思是管道:1个消费者/1个生产者)，您不应该实现完整的数据流框架。管道就是将某种函数绑定在一起，将一个函数的结果传递给下一个函数的参数。数据流框架是关于组件接口/模式和绑定技术的。(很有趣，我已经写过了)

我会给线程构建块http://threadingbuildingblocks.org/一个尝试。它是开源和跨平台的。维基百科的文章很好:http://en.wikipedia.org/wiki/Intel_Threading_Building_Blocks

我今天刚刚读了关于RaftLib的文章，它使用模板和类来创建称为"内核"的管道元素。除了并行数据流之外，它还支持像您所展示的Bash示例那样的串行管道。来自首页上的Hello world示例:

#include <raft>
#include <raftio>
#include <cstdlib>
#include <string>
class hi : public raft::kernel
{
public:
    hi() : raft::kernel()
    {
       output.addPort< std::string >( "0" ); 
    }
    virtual raft::kstatus run()
    {
        output[ "0" ].push( std::string( "Hello Worldn" ) );
        return( raft::stop ); 
    }
};

int
main( int argc, char **argv )
{
    /** instantiate print kernel **/
    raft::print< std::string > p;
    /** instantiate hello world kernel **/
    hi hello;
    /** make a map object **/
    raft::map m;
    /** add kernels to map, both hello and p are executed concurrently **/
    m += hello >> p;
    /** execute the map **/
    m.exe();
    return( EXIT_SUCCESS );
}