TBB :: Parallel_Reduce和STD ::累积的结果不同

Results of tbb::parallel_reduce and std::accumulate differ

本文关键字：结果 STD Parallel Reduce TBB 更新时间：2023-10-16

我正在学习英特尔的TBB库。当在std::vector中求和时，tbb::parallel_reduce的结果与std::accumulate的结果在矢量中超过16.777.220元素（在16.777.320元素中经历的错误）不同。这是我的最低工作示例：

#include <iostream>
#include <vector>
#include <numeric>
#include <limits>
#include "tbb/tbb.h"
int main(int argc, const char * argv[]) {
    int count = std::numeric_limits<int>::max() * 0.0079 - 187800; // - 187900 works
    std::vector<float> heights(size);
    std::fill(heights.begin(), heights.end(), 1.0f);
    float ssum = std::accumulate(heights.begin(), heights.end(), 0);
    float psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<float>::iterator>(heights.begin(), heights.end()), 0,
                                      [](tbb::blocked_range<std::vector<float>::iterator> const& range, float init) {
                                          return std::accumulate(range.begin(), range.end(), init);
                                      }, std::plus<float>()
                                      );
    std::cout << std::endl << " Heights serial sum: " << ssum << "   parallel sum: " << psum;
    return 0;
}

以Xcode 6.3.1和TBB稳定在我的OSX 10.10.3上输出4.3-20141023（从Brew倒入）：

Heights serial sum: 1.67772e+07   parallel sum: 1.67773e+07

为什么？我应该向TBB开发人员报告错误吗？

其他测试，应用您的答案：

 correct value is: 1949700403
 cause we add 1.0f to zero 1949700403 times
 using (int) init values:
 Runtime: 17.407 sec. Heights serial   sum: 16777216.000, wrong
 Runtime:  8.482 sec. Heights parallel sum: 131127368.000, wrong
 using (float) init values:
 Runtime: 12.594 sec. Heights serial   sum: 16777216.000, wrong
 Runtime:  5.044 sec. Heights parallel sum: 303073632.000, wrong
 using (double) initial values:
 Runtime: 13.671 sec. Heights serial   sum: 1949700352.000, wrong
 Runtime:  5.343 sec. Heights parallel sum: 263690016.000, wrong
 using (double) initial values and tbb::parallel_deterministic_reduce:
 Runtime: 13.463 sec. Heights serial   sum: 1949700352.000, wrong
 Runtime: 99.031 sec. Heights parallel sum: 1949700352.000, wrong >>> almost 10x slower !

为什么所有减少呼叫都会产生错误的总和？(double)不够吗？这是我的测试代码：

    #include <iostream>
    #include <vector>
    #include <numeric>
    #include <limits>
    #include <sys/time.h>
    #include <iomanip>
    #include "tbb/tbb.h"
    #include <cmath>
    class StopWatch {
    private:
        double elapsedTime;
        timeval startTime, endTime;
    public:
        StopWatch () : elapsedTime(0) {}
        void startTimer() {
            elapsedTime = 0;
            gettimeofday(&startTime, 0);
        }
        void stopNprintTimer() {
            gettimeofday(&endTime, 0);
            elapsedTime = (endTime.tv_sec - startTime.tv_sec) * 1000.0;             // compute sec to ms
            elapsedTime += (endTime.tv_usec - startTime.tv_usec) / 1000.0;          // compute us to ms and add
            std::cout << " Runtime: " << std::right << std::setw(6) << elapsedTime / 1000 << " sec.";             // show in sec
        }
    };
    int main(int argc, const char * argv[]) {
        StopWatch watch;
        std::cout << std::fixed << std::setprecision(3) << "" << std::endl;
        size_t count = std::numeric_limits<int>::max() * 0.9079;
        std::vector<float> heights(count);
        std::cout << " Vector size: " << count << std::endl;
        std::fill(heights.begin(), heights.end(), 1.0f);
        watch.startTimer();
        float ssum = std::accumulate(heights.begin(), heights.end(), 0.0); // change type of initial value here
        watch.stopNprintTimer();
        std::cout << " Heights serial   sum: " << std::right << std::setw(8) << ssum << std::endl;
        watch.startTimer();
        float psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<float>::iterator>(heights.begin(), heights.end()), 0.0, // change type of initial value here
                                          [](tbb::blocked_range<std::vector<float>::iterator> const& range, float init) {
                                              return std::accumulate(range.begin(), range.end(), init);
                                          }, std::plus<float>()
                                          );
        watch.stopNprintTimer();
        std::cout << " Heights parallel sum: " << std::right << std::setw(8) << psum << std::endl;
        return 0;
    }

回答我的最后一个问题：它们都会产生错误的结果，因为它们不是为大量的整数增加而制造的。切换到int解决：

[...]
std::vector<int> heights(count);
std::cout << " Vector size: " << count << std::endl;
std::fill(heights.begin(), heights.end(), 1);
watch.startTimer();
int ssum = std::accumulate(heights.begin(), heights.end(), (int)0);
watch.stopNprintTimer();
std::cout << " Heights serial   sum: " << std::right << std::setw(8) << ssum << std::endl;
watch.startTimer();
int psum = tbb::parallel_reduce(tbb::blocked_range<std::vector<int>::iterator>(heights.begin(), heights.end()), (int)0,
                                  [](tbb::blocked_range<std::vector<int>::iterator> const& range, int init) {
                                      return std::accumulate(range.begin(), range.end(), init);
                                  }, std::plus<int>()
                                  );
watch.stopNprintTimer();
std::cout << " Heights parallel sum: " << std::right << std::setw(8) << psum << std::endl;
[...]

导致：

Vector size: 1949700403
Runtime: 13.041 sec. Heights serial   sum: 1949700403, correct
Runtime:  4.728 sec. Heights parallel sum: 1949700403, correct and almost 4x faster

您对std::accumulate的调用正在进行整数添加，然后在计算结束时将结果转换为float。为了在浮点数上积累，累加器应为float ^*。

。

float ssum = std::accumulate(heights.begin(), heights.end(), 0.0f);
                                                             ^^^^

^{*或任何其他可以正确累积float的类型。}

对'为什么？'的其他正确答案一部分，我还要补充说，TBB提供了parallel_deterministic_reduce，可确保在相同数据上两个和更多运行之间可重复的结果（但它仍然可以随STD ::累积而异）。请参阅描述问题和确定性算法的博客。

因此，关于"我应该向TBB开发人员报告错误？"部分，答案显然是否（除非您在TBB侧发现不足的东西）。

这可能为您解决此特定问题：

您对std ::累积的调用是在进行整数，然后将结果转换为在计算结束时浮动。

，但浮点添加不是关联操作：

累积：（...（（S A1） A2） ...） AN
使用parraL_Reduce：任何括号置换。

http://docs.oracle.com/cd/e19957-01/806-3568/ncg_goldberg.html