c++多线程嵌套for循环

C++ Multithreading nested for loops

本文关键字：循环 for 嵌套多线程 c++ 更新时间：2023-10-16

首先，我对多线程知之甚少，我很难找到优化这段代码的最佳方法，但多线程似乎是我应该走的路。

double
applyFilter(struct Filter *filter, cs1300bmp *input, cs1300bmp *output)
{
    long long cycStart, cycStop;
    cycStart = rdtscll();
    output -> width = input -> width;
    output -> height = input -> height;
    int temp1 = output -> width;
    int temp2 = output -> height;
    int width=temp1-1;
    int height=temp2 -1;
    int getDivisorVar= filter -> getDivisor();  
    int t0, t1, t2, t3, t4, t5, t6, t7, t8, t9;
    int keep0= filter -> get(0,0);
    int keep1= filter -> get(1,0);
    int keep2= filter -> get(2,0);
    int keep3= filter -> get(0,1);
    int keep4= filter -> get(1,1);
    int keep5= filter -> get(2,1);
    int keep6= filter -> get(0,2);
    int keep7= filter -> get(1,2);
    int keep8= filter -> get(2,2);

    //Declare variables before the loop
    int plane, row, col;    
    for (plane=0; plane < 3; plane++) {
        for(row=1; row < height ; row++) {
            for (col=1; col < width; col++) {
                t0 = (input -> color[plane][row - 1][col - 1]) * keep0;
                t1 = (input -> color[plane][row][col - 1]) * keep1;
                t2 = (input -> color[plane][row + 1][col - 1]) * keep2;
                t3 = (input -> color[plane][row - 1][col]) * keep3;
                t4 = (input -> color[plane][row][col]) * keep4;
                t5 = (input -> color[plane][row + 1][col]) * keep5;
                t6 = (input -> color[plane][row - 1][col + 1]) * keep6;
                t7 = (input -> color[plane][row][col + 1]) * keep7;
                t8 = (input -> color[plane][row + 1][col + 1]) * keep8;
                // NEW LINE HERE
                t9 = t0 + t1 + t2 + t3 + t4 + t5 + t6 + t7 + t8;
                t9 = t9 / getDivisorVar;
                if ( t9 < 0 ) {
                    t9 = 0;
                }
                if ( t9  > 255 ) {
                    t9 = 255;
                } 
                output -> color[plane][row][col] = t9;
            } ....

所有这些代码很可能都不是必需的，但它确实提供了一些上下文。因此，因为3个"for"循环中的第一个循环只从0-2开始，所以我希望有一种方法可以让底部的两个"for"循环同时运行，以获得不同的"平面"值。这可能吗?如果是这样，它真的会让我的程序更快吗?

我也会研究一下OpenMP。这是一个很棒的库，它允许使用pragma以非常简单的方式执行线程。OpenMP可以在许多平台上编译，你只需要确保你的平台支持它!

我有一组有8层for循环的代码，它很好地执行了线程。

是的，完全有可能。在这种情况下，你应该不用担心访问同步(即竞争条件)，因为两个线程将操作不同的数据集。

在多核机器上，这绝对会加快你的代码。

您可能想看看std::thread(如果您可以使用c++ 11)跨平台线程实现(因为您没有指定您的目标平台)。或者更好的使用线程支持库

您还可以考虑检测内核数量并启动适当数量的线程，如threadcount = min(planes, cores)，并为每个worker函数提供访问单个平面数据集的权限。

看起来你可以把它分解成线程，你可能会看到一个很好的速度提高。但是，编译器已经在尝试为您展开循环，并通过向量化指令获得并行性。您的收益可能没有您想象的那么多，特别是当您使用来自不同位置的读取使内存总线饱和时。

你可能会考虑的是，如果这是一个2d图形操作，尝试使用OpenGL或类似的，因为它将利用你系统上的硬件，并且它有一些并行性。

线程版本的代码将比简单实现慢。因为在线程版本中会有很多时间花在同步上。同样，在线程版本中，您将有缓存性能缺陷。

另外，编译器很有可能展开带有3次循环的外部for循环并并行执行。

您可以尝试创建线程版本并比较性能。无论如何，这将是一次有用的经验。

对于这种情况，您可以使用比自动将For循环转换为线程的编译器更糟糕的方法。

使用这样的代码，编译器可以确定是否存在任何迭代间的数据依赖。如果不是，那么它知道它可以安全地在多个线程之间分割for循环，在最后放置一个标准的线程同步。通常，这样的编译器能够插入代码，在运行时确定拥有线程的开销是否会被好处所抵消。

唯一的问题是，你有编译器来做这件事吗?如果是这样的话，那么这是迄今为止获得线程的好处的最简单的方法，可以获得像这样直接的、几乎公开的并行性。

我知道Sun的C编译器做到了(我认为他们是最早做到这一点的之一)。它可能只适用于Solaris版本的编译器)。我认为英特尔的编译器也可以。我对GCC有疑问(尽管我很高兴在这一点上得到纠正)，我对微软的编译器也不太确定。