英特尔TBB内存开销

intel tbb memory overhead

本文关键字：开销内存 TBB 英特尔更新时间：2023-10-16

我们在使用Intel TBB功能的线程时会遇到高内存开销。我们希望，一旦线程完成给定的工作量，它将释放各自的内存。但是，即使通过线程执行工作单位之间存在很长的停顿，似乎并非如此。

我们准备了一个示例来显示问题：

int main() {
   blocking_queue<size_t> command_input_queue;
   tbb::atomic<size_t> count = 1;
   //workers
   std::vector<std::thread> worker;
   for(size_t i = 0; i < 15; i++) {
      worker.push_back(std::thread([&command_input_queue, &count](){
        while(true)
        {
            size_t size;
            //wait for work..
            command_input_queue.wait_and_pop(size);
            //do some work with Intel TBB
            std::vector<int32_t> result(size);
            for(size_t i = 0; i < result.size(); i++) {
                result[i] =  i % 1000;
            }
            tbb::parallel_sort(result.begin(), result.end());
            size_t local_count = count++;
            std::cout << local_count << " work items executed " << std::endl;
        }
    }));
   }
   //enqueue work
   size_t work_items = 15;
   for(size_t i = 0; i < work_items   ; i++) {
      command_input_queue.push(10 * 1000 * 1000);
   }
   while(true) {
      boost::this_thread::sleep( boost::posix_time::seconds(1) );
      if(count > 15) {
         break;
      }
   }
   //wait for more commands
   std::cout << "Wait" << std::endl;
   boost::this_thread::sleep( boost::posix_time::seconds(60) );
   //----!During the wait, while no thread is active, 
   //the process still claims over 500 MB of memory!----
   for(size_t i = 0; i < 15; i++) {
     command_input_queue.push(1000 * 1000);
   }
...

在示例中，我们启动了15个工作线程。他们等待任务并执行tbb :: Parallel_sort，并在完成所有资源后释放所有资源。问题在于处理所有任务毕竟，所有工人都在等待新任务，该过程仍然要求500MB的内存。

诸如Valgrind's Massif之类的工具并未向我们展示声明的内存。我们将程序与libtbb.so相关联。因此，TBB分配器不应该是问题。

有人知道我们如何释放内存，而工人闲置时？

在调用delete或free后，通常不会返回OS。您需要致电malloc_trim或您的分配特定功能才能做到这一点。

TBB调度程序缓存任务分配，尽管已连接分配器，尽管它不能解释500MB。可以解释的是，TBB动态加载TBB分配器，如果可以在libtbb.so旁边找到它，则可以缓存内存。您可以通过设置env var TBB_VERSION=1

来检查tbbmalloc是否被激活

对我来说，看起来很奇怪的是，为什么您在TBB创建自己的工人时用工人线程对机器进行订阅？