C++11异步只使用一个核心

C++11 async is using only one core

本文关键字：一个核心异步 C++11 更新时间：2023-10-16

我试图在C++中并行化一个长时间运行的函数，并使用std:：async，它只使用一个核心。

这并不是因为函数的运行时间太短，因为我目前使用的测试数据大约需要10分钟才能运行。

根据我的逻辑，我创建了相当于Futures的NThreads（每个线程占用循环的一部分，而不是一个单独的单元，因此它是一个运行时间很长的线程），每个线程都将调度一个异步任务。然后在它们被创建之后，程序旋转锁等待它们完成。然而，它总是使用一个核心？！

这也不是我看着顶部说它看起来大致像一个CPU，我的ZSH配置输出上一个命令的CPU%，它总是精确地100%，从未超过

auto NThreads = 12;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));
std::vector<std::future<std::vector<unsigned __int128>>> Futures;
for (auto I = 0; I < NThreads; ++I) {
    std::cout << "HERE" << std::endl;
    unsigned __int128 Min = I * BlockSize;
    unsigned __int128 Max = I * BlockSize + BlockSize;
    if (I == NThreads - 1)
        Max = PathCountLength;
    Futures.push_back(std::async(
        [](unsigned __int128 WMin, unsigned __int128 Min, unsigned__int128 Max,
           std::vector<unsigned __int128> ZeroChildren,
           std::vector<unsigned __int128> OneChildren,
           unsigned __int128 PathCountLength)
           -> std::vector<unsigned __int128> {
           std::vector<unsigned __int128> LocalCount;
           for (unsigned __int128 I = Min; I < Max; ++I)
               LocalCount.push_back(KneeParallel::pathCountOrStatic(
                   WMin, I, ZeroChildren, OneChildren, PathCountLength));
          return LocalCount;
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength));
}
for (auto &Future : Futures) {
    Future.get();
}

有人有什么见解吗。

我正在Arch Linux上使用clang和LLVM进行编译。有没有我需要的编译标志，但从我能告诉C++11标准化了线程库的内容来看？

编辑：如果它能帮助任何人提供任何进一步的线索，当我注释掉局部向量时，它会在所有核心上运行，当我把它放回回滚到一个核心时。

编辑2：所以我确定了解决方案，但它看起来很奇怪。从lambda函数返回向量将其固定为一个核心，所以现在我通过向输出向量传递shared_ptr并对其进行操作来绕过它。嘿，普雷斯托，它在核心上燃烧起来！

我觉得现在使用期货毫无意义，因为我没有回报，我会使用线程，不！，使用no返回的线程也使用一个核心。奇怪吧？

好吧，回到使用期货，只需返回一个into就可以扔掉什么的。是的，你猜对了，即使从线程中返回一个int，也会将程序粘在一个核心上。除了futures不能有void lambda函数。因此，我的解决方案是将一个指针传递给一个从不返回任何内容的int lambda函数来存储输出。是的，感觉像胶带，但我看不到更好的解决方案。

看起来很…奇怪？就像编译器在某种程度上解释lambda不正确一样。可能是因为我使用的是LLVM的开发版本，而不是一个稳定的分支。。。？

不管怎样，我的解决方案，因为我最讨厌的就是在这里找到我的问题，却没有答案：

auto NThreads = 4;
auto BlockSize = (int)std::ceil((int)(NThreads / PathCountLength));
auto Futures = std::vector<std::future<int>>(NThreads);
auto OutputVectors =
    std::vector<std::shared_ptr<std::vector<unsigned __int128>>>(
        NThreads, std::make_shared<std::vector<unsigned __int128>>());
for (auto I = 0; I < NThreads; ++I) {
  unsigned __int128 Min = I * BlockSize;
  unsigned __int128 Max = I * BlockSize + BlockSize;
if (I == NThreads - 1)
  Max = PathCountLength;
Futures[I] = std::async(
  std::launch::async,
  [](unsigned __int128 WMin, unsigned __int128 Min, unsigned __int128 Max,
       std::vector<unsigned __int128> ZeroChildren,
       std::vector<unsigned __int128> OneChildren,
       unsigned __int128 PathCountLength,
       std::shared_ptr<std::vector<unsigned __int128>> OutputVector)
        -> int {
      for (unsigned __int128 I = Min; I < Max; ++I) {
        OutputVector->push_back(KneeParallel::pathCountOrStatic(
            WMin, I, ZeroChildren, OneChildren, PathCountLength));
      }
    },
    WMin, Min, Max, ZeroChildInit, OneChildInit, PathCountLength,
    OutputVectors[I]);
}
for (auto &Future : Futures) {
  Future.get();
}

通过为async提供第一个参数，您可以将其配置为延迟运行（std::launch::deferred）、在自己的线程中运行（std::launch::async），或者让系统在这两个选项之间做出决定（std::launch::async | std::launch::deferred）。后者是默认行为。

因此，要强制它在另一个线程中运行，请将std::async的调用调整为std::async(std::launch::async, /*...*/)。