唤醒线程很耗时

Wake up of the thread is time consuming

本文关键字：线程唤醒更新时间：2023-10-16

#ifndef THREADPOOL_H
#define THREADPOOL_H
#include <iostream>
#include <deque>
#include <functional>
#include <thread>
#include <condition_variable>
#include <mutex>
#include <atomic>
#include <vector>
//thread pool
class ThreadPool
{
public:
    ThreadPool(unsigned int n = std::thread::hardware_concurrency())
        : busy()
        , processed()
        , stop()
    {
        for (unsigned int i=0; i<n; ++i)
            workers.emplace_back(std::bind(&ThreadPool::thread_proc, this));
    }
    template<class F> void enqueue(F&& f)
    {
        std::unique_lock<std::mutex> lock(queue_mutex);
        tasks.emplace_back(std::forward<F>(f));
        cv_task.notify_one();
    }
    void waitFinished()
    {
        std::unique_lock<std::mutex> lock(queue_mutex);
        cv_finished.wait(lock, [this](){ return tasks.empty() && (busy == 0); });
    }
    ~ThreadPool()
    {
        // set stop-condition
        std::unique_lock<std::mutex> latch(queue_mutex);
        stop = true;
        cv_task.notify_all();
        latch.unlock();
        // all threads terminate, then we're done.
        for (auto& t : workers)
            t.join();
    }
    unsigned int getProcessed() const { return processed; }
private:
    std::vector< std::thread > workers;
    std::deque< std::function<void()> > tasks;
    std::mutex queue_mutex;
    std::condition_variable cv_task;
    std::condition_variable cv_finished;
    unsigned int busy;
    std::atomic_uint processed;
    bool stop;
    void thread_proc()
    {
        while (true)
        {
            std::unique_lock<std::mutex> latch(queue_mutex);
            cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
            if (!tasks.empty())
            {
                // got work. set busy.
                ++busy;
                // pull from queue
                auto fn = tasks.front();
                tasks.pop_front();
                // release lock. run async
                latch.unlock();
                // run function outside context
                fn();
                ++processed;
                latch.lock();
                --busy;
                cv_finished.notify_one();
            }
            else if (stop)
            {
                break;
            }
        }
    }
};
#endif // THREADPOOL_H

i使用闩锁具有上述线程池实现。但是，每次我通过 inqueue 调用添加任务时，开销很大，大约需要100个微秒。

如何提高线程池的性能？

您的代码看起来不错。您有关对发布优化进行编译的问题的评论可能是正确的，您需要做的一切。

免责声明：始终使用适当的工具测量代码，以在尝试改善其性能之前识别瓶颈的位置。否则，您可能不会获得所寻求的改进。

但是有几个潜力 micro-optimization 我看到了。

在您的thread_proc功能中更改此此事

    while (true)
    {
        std::unique_lock<std::mutex> latch(queue_mutex);
        cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
        if (!tasks.empty())

：

    std::unique_lock<std::mutex> latch(queue_mutex);
    while (!stop)
    {
        cv_task.wait(latch, [this](){ return stop || !tasks.empty(); });
        while (!tasks.empty() && !stop)

，然后删除else if (stop)块和功能的末尾。

这主要影响是，由于latch在while循环的每次迭代中脱离范围，它避免了queue_mutex上的额外"解锁"answers"锁定"。将if (!tasks.empty())更改为while (!tasks.empty())也可能会通过让具有量子锁定的当前执行线程来保存一两个周期

＆lt;意见＆gt;最后一件事。我一直认为notify应该在锁之外。这样，当刚刚更新队列的线程唤醒另一个线程时，就不会锁定。但是我从来没有实际测量过这个假设，所以用一粒盐来服用它：

template<class F> void enqueue(F&& f)
{
    queue_mutex.lock();
        tasks.emplace_back(std::forward<F>(f));
    queue_mutex.unlock();
    cv_task.notify_one();
}