C++中的线程性能

Thread performance in C++

本文关键字：性能线程 C++ 更新时间：2023-10-16

我有以下代码：

  for ( int i = 0  ; i < 100 ; i++)
    for (int j = 0 ; j < 80 ; j ++)
    { 
     ...
    }

我把它分成8个线程。

pthread_t thread1, thread2,thread3,thread4,thread5,thread6,thread7,thread8;
int rc1,rc2,rc3,rc4,rc5,rc6,rc7,rc8;

 struct threads 
 {
 ...
 }
 void *PrintHello(void *args)
 {
   for (int j = 0 ; j < 10 ; j ++)
    { 
    }
 }
  for ( int i = 0  i < 100 ; i ++)
  {
    rc1 = pthread_create(&thread1, NULL,PrintHello,threads);,
    pthread_join(thread1,NULL);
    rc2 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread2,NULL);
    rc3 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread3,NULL);
    rc4 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread4,NULL);
    rc5 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread5,NULL);
    rc6 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread6,NULL);
    rc7 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread7,NULL);
    rc8 = pthread_create(&thread1, NULL,PrintHello,threads);
    pthread_join(thread8,NULL);
  }

我认为第二个应该比第一个快。但是，第二个的行为就像只有一个线程。换句话说未拆分的代码和已拆分的代码同时运行。为什么它们有相同的运行时间，而第二个有8个线程，第一个有一个线程？

提前谢谢。

pthread_join(thread1,NULL);将停止执行主线程，直到thread1完成。在构造完所有线程之后，需要将所有pthread_join移到，以便它们都可以并发运行。

对于每个pthread_create(&thread1, NULL,PrintHello,threads);，您也使用thread1。你需要使用其他线程以及

for (int i = 0 i < 100; i++)
{
    rc1 = pthread_create(&thread1, NULL, PrintHello, threads);
    rc2 = pthread_create(&thread2, NULL, PrintHello, threads);
    rc3 = pthread_create(&thread3, NULL, PrintHello, threads);
    rc4 = pthread_create(&thread4, NULL, PrintHello, threads);
    rc5 = pthread_create(&thread5, NULL, PrintHello, threads);
    rc6 = pthread_create(&thread6, NULL, PrintHello, threads);
    rc7 = pthread_create(&thread7, NULL, PrintHello, threads);
    rc8 = pthread_create(&thread8, NULL, PrintHello, threads);
    pthread_join(thread1, NULL);
    pthread_join(thread2, NULL);
    pthread_join(thread3, NULL);
    pthread_join(thread4, NULL);
    pthread_join(thread5, NULL);
    pthread_join(thread6, NULL);
    pthread_join(thread7, NULL);
    pthread_join(thread8, NULL);
}

除了@NathanOliver所说的，请记住，生成一个线程需要付出巨大的代价，所以无论你打算让线程做什么工作，它都必须比实际生成所有这些线程的代价更高。因此，如果PrintHello方法确实做到了这一点，那么与单线程版本相比，您可能仍然会看到性能下降。抵消这些成本的通常方法是在一开始就产生有限数量的线程，并在它们之间可用时分配工作。

此外，最后但并非最不重要的是，如果您的PrintHello方法实际上只做了这一点，即printf("Hellon")或类似的方法，那么无论怎样，您都很可能看不到性能的提高，因为printf()很可能会占用一个共享锁，这将导致您的所有线程重复尝试占用它时出现巨大的争用。

最重要的是，多线程对提高性能是很好的，但它并不是微不足道的。大多数时候，仅仅向代码抛出线程根本不会提高你的性能，在最坏的情况下，它实际上会降低性能。如果你想看到速度的提高，你应该分析你的代码，并注意那些可以很容易地划分为几个工作的大任务，这些工作都可以独立地处理部分结果。这些类型的工作可以很容易地进行多线程处理，并且可以看到比单线程代码更高的吞吐量。

除了@NathanOliver和@JustSid之外，如果它们都做相同的工作，那么单独定义8个线程并编写8次相同的代码并不是很好。使用这样的东西要好得多

pthread_t threadlist[8];
int results[8];

为了初始化它，一个简单的for循环就可以了。

for(int i = 0; i < 8; i++)
    results[i] = pthread_create(&threadlist[i], NULL, PrintHello, threads);

并使用启动线程

for(int i = 0; i < 8; i++)
    pthread_join(threadlist[i], NULL);

这大大减少了代码。也许在运行时确实有一些的细微变化，但带有int的for循环只是更多的汇编命令。我不知道这是否可以衡量。