pthread_join是一个瓶颈

pthread_join is being a bottleneck

本文关键字：一个 join pthread 更新时间：2023-10-16

我有一个应用程序，其中pthread_join是瓶颈。我需要帮助来解决这个问题。

void *calc_corr(void *t) {
         begin = clock();
         // do work
         end = clock();
         duration = (double) (1000*((double)end - (double)begin)/CLOCKS_PER_SEC);
         cout << "Time is "<<duration<<"t"<<h<<endl;
         pthread_exit(NULL);
}
int main() {
         start_t = clock();
         for (ii=0; ii<16; ii++) 
            pthread_create(&threads.p[ii], NULL, &calc_corr, (void *)ii);
         for (i=0; i<16; i++) 
            pthread_join(threads.p[15-i], NULL);
         stop_t = clock();
         duration2 = (double) (1000*((double)stop_t - (double)start_t)/CLOCKS_PER_SEC);
         cout << "n Time is "<<duration2<<"t"<<endl;
         return 0;
}

线程函数中打印的时间在40ms-60ms的范围内，其中主函数打印的时间为650ms-670ms。具有讽刺意味的是，我的串行代码运行时间为650ms-670ms。我能做些什么来减少pthread_join所花费的时间？

提前感谢！

在Linux上，clock()测量组合CPU时间它不测量墙时间

这就是为什么你会得到~640 ms = 16 * 40ms。（如评论中所指出的）

要测量墙时间，您应该使用以下内容：

gettimeofday()
clock_gettime()

通过创建一些线程，您为系统增加了开销：创建时间、调度时间。创建线程需要分配堆栈等；调度意味着更多的上下文切换。此外，pthread_join suspends execution of the calling thread until the target thread terminates。这意味着你希望线程1完成，当他完成时，你会尽快重新安排，但不是立即，然后你等待线程2，等等。

现在，你的计算机只有几个内核，比如一个或两个，而你正在创建16个线程。程序中最多有两个线程同时运行，只需添加它们的时钟测量值，就可以实现400 ms。

同样，这取决于很多事情，所以我很快就了解了正在发生的事情。