使用std::thread和良好实践并行化循环

Parallelize a loop using std::thread and good practices

本文关键字：并行化循环 std thread 使用更新时间：2023-10-16

可能重复：
C++2011:std:：thread:循环并行化的简单示例？

考虑以下程序，该程序将计算分布在向量的元素上(我以前从未使用过std:：thread(：

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>
// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}
// Main
int main(int argc, char* argv[]) {
    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);
    // Initialization
    std::iota(v.begin(), v.end(), 0);
    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }
    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?
    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

代码中已经嵌入了两个问题。

第三个问题是：这段代码完全可以吗？或者可以用std:：threads以更优雅的方式编写吗？我不知道使用std:：thread的"好做法"。。。

关于第一个问题，如何计算每个线程要计算的范围：我提取了常量并为它们命名，以使代码更易于阅读。为了更好的实践，我还使用了一个lambda，它使代码更容易修改——lambda中的代码只能在这里使用，而函数f可以在整个程序的其他代码中使用。利用它将代码的共享部分放在函数中，并专门用于lambda中只使用过一次的代码。

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

高级用例会使用线程池，但它的外观取决于您的应用程序设计，并且不在STL中。有关线程模型的一个好例子，请参阅Qt框架。如果您刚开始使用线程，请稍后保存。

第二个问题已经在评论中得到回答。std::thread::join函数将等待(块(，直到线程完成。通过在每个线程上调用联接函数并在联接函数之后到达代码，可以确保所有线程都已完成，现在可以删除。