Parallel for with omp
Parallel for with omp
我尝试使用 OpenMP 优化以下循环:
#pragma omp parallel for private(diff)
for (int j = 0; j < x.d; ++j) {
diff = x(example,j) - x(chosen_pts[ndx - 1],j);
#pragma omp atomic
d2 += diff * diff;
}
但它的运行速度实际上比没有#pragma
慢 4 倍。
编辑
正如 Piotr S., coincoin 和 erenon 指出的那样,就我而言,x.d
是如此之小,这就是为什么并行性使我的代码运行速度变慢的原因。我也发布了外循环,也许多线程有一些可能性:(x.n 超过 1 亿)
float sum_distribution = 0.0;
// look for the point that is furthest from any center
float max_dist = 0.0;
for (int i = 0; i < x.n; ++i) {
int example = dist2[i].second;
float d2 = 0.0, diff;
//#pragma omp parallel for private(diff) reduction(+:d2)
for (int j = 0; j < x.d; ++j) {
diff = x(example,j) - x(chosen_pts[ndx - 1],j);
d2 += diff * diff;
}
if (d2 < dist2[i].first) {
dist2[i].first = d2;
}
if (dist2[i].first > max_dist) {
max_dist = dist2[i].first;
}
sum_distribution += dist2[i].first;
}
如果有人感兴趣,这里是整个函数:https://github.com/ghamerly/baylorml/blob/master/fast_kmeans/general_functions.cpp#L169,但据我测量,85% 的经过时间来自这个循环。
是的,发布的外部循环可以与 OpenMP 并行化。在循环中修改的所有变量要么是迭代的本地变量,要么用于循环聚合。而且我认为在计算diff
时调用x()
没有副作用。
要正确高效地并行执行聚合,您需要使用带有 reduction
子句的 OpenMP 循环。对于sum_distribution
,约简操作是+
,对于max_dist
,它是max
。因此,在外部循环前面添加以下编译指示应该可以完成这项工作:
#pragma omp parallel for reduction(+:sum_distribution) reduction(max:max_dist)
请注意,max
作为缩减操作只能从 OpenMP 3.1 开始使用。它并不是什么新鲜事,因此大多数支持 OpenMP 的编译器已经支持它,但不是全部;或者您可能使用旧版本。因此,查阅编译器的文档是有意义的。
相关文章:
- Problems with std::cin.fail()
- C++omp没有显著改善
- 应用程序崩溃并显示"symbol _ZdlPvm, version Qt_5 not defined in file libQt5Core.so.5 with link time reference"
- 等待整个 omp 块完成,然后再调用第二个函数
- 这对"With a stackless coroutine, only the top-level routine may be suspended."意味着什么
- Boost.TEST with CLion: "Test framework quit unexpectedly"
- 避免碎片化的ClientHellos with OpenSSL (DTLS)
- Issues with Win32 ReadProcessMemory API
- Qt with WinAPI MouseProc
- [[maybe_unused]] with structured_binding?
- Issue with WriteProcessMemory
- OpenCV RTP-Stream with FFMPEG
- "Unable to start debugging. No process is associated with this object." - 在Visual Studio Code中使用GDB
- std::adjacent_difference with std::chrono time_point
- OpenMP #pragma omp for v/s #pragma omp parallel for 之间的区别?
- DLL Made with CMake 使程序崩溃
- QtCreator with C 库中的链接器问题
- SHBrowseForFolder with BIF_BROWSEFORCOMPUTER and SHGetPathFr
- std::inner_product with omp
- Parallel for with omp