为什么 std::condition_variable 使调度不公平
Why does std::condition_variable make scheduling unfair?
我正在尝试创建一个简单的池对象,我想或多或少地公平地将对一组共享资源的访问权限分配给任何请求它的线程。 在Windows中,我通常会有一个互斥数组并执行WaitForMultipleObjects,bWaitAll=FALSE(见下文windows_pool_of_n_t)。 但我希望有一天能够将其移植到其他操作系统,所以我想坚持这个标准。 资源的数量,在size()!=0上condition_variable似乎是显而易见的解决方案(见下面的pool_of_n_t)。
但由于我不明白的原因,该代码序列化线程访问。 我并不期望严格的公平性,但这几乎是最糟糕的情况 - 上次锁定的线程下次总是得到锁定。 这并不是说 std::mutex 或多或少不符合 Windows 公平调度,因为只使用没有条件变量的互斥锁可以按预期工作,尽管当然只适用于一个池(请参阅下面的pool_of_one_t)。
谁能解释一下? 有没有办法解决这个问题?
结果:
C:tempstdpool>binstdpool.exe
pool:pool_of_one_t
thread 0:19826 ms
thread 1:19846 ms
thread 2:19866 ms
thread 3:19886 ms
thread 4:19906 ms
thread 5:19926 ms
thread 6:19946 ms
thread 7:19965 ms
thread 8:19985 ms
thread 9:20004 ms
pool:windows_pool_of_n_t(1)
thread 0:19819 ms
thread 1:19838 ms
thread 2:19858 ms
thread 3:19878 ms
thread 4:19898 ms
thread 5:19918 ms
thread 6:19938 ms
thread 7:19958 ms
thread 8:19978 ms
thread 9:19997 ms
pool:pool_of_n_t(1)
thread 9:3637 ms
thread 0:4538 ms
thread 6:7558 ms
thread 4:9779 ms
thread 8:9997 ms
thread 2:13058 ms
thread 1:13997 ms
thread 3:17076 ms
thread 5:17995 ms
thread 7:19994 ms
pool:windows_pool_of_n_t(2)
thread 1:9919 ms
thread 0:9919 ms
thread 2:9939 ms
thread 3:9939 ms
thread 5:9958 ms
thread 4:9959 ms
thread 6:9978 ms
thread 7:9978 ms
thread 9:9997 ms
thread 8:9997 ms
pool:pool_of_n_t(2)
thread 2:6019 ms
thread 0:7882 ms
thread 4:8102 ms
thread 5:8182 ms
thread 1:8382 ms
thread 8:8742 ms
thread 7:9162 ms
thread 9:9641 ms
thread 3:9802 ms
thread 6:10201 ms
pool:windows_pool_of_n_t(5)
thread 4:3978 ms
thread 3:3978 ms
thread 2:3979 ms
thread 0:3980 ms
thread 1:3980 ms
thread 9:3997 ms
thread 7:3999 ms
thread 6:3999 ms
thread 5:4000 ms
thread 8:4001 ms
pool:pool_of_n_t(5)
thread 2:3080 ms
thread 0:3498 ms
thread 8:3697 ms
thread 3:3699 ms
thread 6:3797 ms
thread 7:3857 ms
thread 1:3978 ms
thread 4:4039 ms
thread 9:4057 ms
thread 5:4059 ms
代码:
#include <iostream>
#include <deque>
#include <vector>
#include <mutex>
#include <thread>
#include <sstream>
#include <chrono>
#include <iomanip>
#include <cassert>
#include <condition_variable>
#include <windows.h>
using namespace std;
class pool_t {
public:
virtual void check_in(int size) = 0;
virtual int check_out() = 0;
virtual string pool_name() = 0;
};
class pool_of_one_t : public pool_t {
mutex lock;
public:
virtual void check_in(int resource) {
lock.unlock();
}
virtual int check_out() {
lock.lock();
return 0;
}
virtual string pool_name() {
return "pool_of_one_t";
}
};
class windows_pool_of_n_t : public pool_t {
vector<HANDLE> resources;
public:
windows_pool_of_n_t(int size) {
for (int i=0; i < size; ++i)
resources.push_back(CreateMutex(NULL, FALSE, NULL));
}
~windows_pool_of_n_t() {
for (auto resource : resources)
CloseHandle(resource);
}
virtual void check_in(int resource) {
ReleaseMutex(resources[resource]);
}
virtual int check_out() {
DWORD result = WaitForMultipleObjects(resources.size(),
resources.data(), FALSE, INFINITE);
assert(result >= WAIT_OBJECT_0
&& result < WAIT_OBJECT_0+resources.size());
return result - WAIT_OBJECT_0;
}
virtual string pool_name() {
ostringstream name;
name << "windows_pool_of_n_t(" << resources.size() << ")";
return name.str();
}
};
class pool_of_n_t : public pool_t {
deque<int> resources;
mutex lock;
condition_variable not_empty;
public:
pool_of_n_t(int size) {
for (int i=0; i < size; ++i)
check_in(i);
}
virtual void check_in(int resource) {
unique_lock<mutex> resources_guard(lock);
resources.push_back(resource);
resources_guard.unlock();
not_empty.notify_one();
}
virtual int check_out() {
unique_lock<mutex> resources_guard(lock);
not_empty.wait(resources_guard,
[this](){return resources.size() > 0;});
auto resource = resources.front();
resources.pop_front();
bool notify_others = resources.size() > 0;
resources_guard.unlock();
if (notify_others)
not_empty.notify_one();
return resource;
}
virtual string pool_name() {
ostringstream name;
name << "pool_of_n_t(" << resources.size() << ")";
return name.str();
}
};
void worker_thread(int id, pool_t& resource_pool)
{
auto start_time = chrono::system_clock::now();
for (int i=0; i < 100; ++i) {
auto resource = resource_pool.check_out();
this_thread::sleep_for(chrono::milliseconds(20));
resource_pool.check_in(resource);
this_thread::yield();
}
static mutex cout_lock;
{
unique_lock<mutex> cout_guard(cout_lock);
cout << "thread " << id << ":"
<< chrono::duration_cast<chrono::milliseconds>(
chrono::system_clock::now() - start_time).count()
<< " ms" << endl;
}
}
void test_it(pool_t& resource_pool)
{
cout << "pool:" << resource_pool.pool_name() << endl;
vector<thread> threads;
for (int i=0; i < 10; ++i)
threads.push_back(thread(worker_thread, i, ref(resource_pool)));
for (auto& thread : threads)
thread.join();
}
int main(int argc, char* argv[])
{
test_it(pool_of_one_t());
test_it(windows_pool_of_n_t(1));
test_it(pool_of_n_t(1));
test_it(windows_pool_of_n_t(2));
test_it(pool_of_n_t(2));
test_it(windows_pool_of_n_t(5));
test_it(pool_of_n_t(5));
return 0;
}
我在 Linux 上pool:pool_of_n_t(2)
做了你的测试,并在
this_thread::yield();
查看我的测试池的comp结果:pool_of_n_t(2):
1) this_thread::yield():
$./a.out
pool:pool_of_n_t(2)
thread 0, run for:2053 ms
thread 9, run for:3721 ms
thread 5, run for:4830 ms
thread 6, run for:6854 ms
thread 3, run for:8229 ms
thread 4, run for:8353 ms
thread 7, run for:9441 ms
thread 2, run for:9482 ms
thread 1, run for:10127 ms
thread 8, run for:10426 ms
它们与您的相似。
2)当我用pthread_yield()
替换this_thread::yield()
时,同样的测试:
$ ./a.out
pool:pool_of_n_t(2)
thread 0, run for:7922 ms
thread 3, run for:8853 ms
thread 4, run for:8854 ms
thread 1, run for:9077 ms
thread 5, run for:9364 ms
thread 9, run for:9446 ms
thread 7, run for:9594 ms
thread 2, run for:9615 ms
thread 8, run for:10170 ms
thread 6, run for:10416 ms
这要公平得多。你假设 this_thread::yield() 确实将 CPU 提供给另一个线程,但它没有提供它。
这是 gcc 4.8 的 this_thread::yield 的区别:
(gdb) disassembl this_thread::yield
Dump of assembler code for function std::this_thread::yield():
0x0000000000401fb2 <+0>: push %rbp
0x0000000000401fb3 <+1>: mov %rsp,%rbp
0x0000000000401fb6 <+4>: pop %rbp
0x0000000000401fb7 <+5>: retq
End of assembler dump.
我没有看到任何重新安排
这对pthread_yield来说是不愉快的:
(gdb) disassemble pthread_yield
Dump of assembler code for function pthread_yield:
0x0000003149c084c0 <+0>: jmpq 0x3149c05448 <sched_yield@plt>
End of assembler dump.
(gdb) disassemble sched_yield
Dump of assembler code for function sched_yield:
0x00000031498cf520 <+0>: mov $0x18,%eax
0x00000031498cf525 <+5>: syscall
0x00000031498cf527 <+7>: cmp $0xfffffffffffff001,%rax
0x00000031498cf52d <+13>: jae 0x31498cf530 <sched_yield+16>
0x00000031498cf52f <+15>: retq
0x00000031498cf530 <+16>: mov 0x2bea71(%rip),%rcx # 0x3149b8dfa8
0x00000031498cf537 <+23>: xor %edx,%edx
0x00000031498cf539 <+25>: sub %rax,%rdx
0x00000031498cf53c <+28>: mov %edx,%fs:(%rcx)
0x00000031498cf53f <+31>: or $0xffffffffffffffff,%rax
0x00000031498cf543 <+35>: jmp 0x31498cf52f <sched_yield+15>
End of assembler dump.
我不认为条件变量是罪魁祸首。
Linux"完全公平队列"和Windows线程调度程序都假设理想的目标是为每个线程提供整个时间片(即公平)。 他们这样做,甚至假设如果一个线程在消耗其整个时间片之前屈服,它就会靠近队列的前面[这是一个粗略的简化],因为这是"公平"的事情。
我觉得这很不幸。如果你有三个线程,其中一个可以工作,另外两个被阻塞等待那个线程,Windows 和 Linux 调度程序都会在给"正确"线程一个机会之前在被阻塞的线程之间来回弹跳多次。
- 如何在c++中实现处理器调度模拟器
- C++ Singleton - Prevent ::instance() to variable
- 如何在 C++17 STL 并行算法中处理调度?
- 如何通过多类"Union variable" (sfml) 使用轮询事件
- 无法使用迭代器标记调度实例化模板
- 将成员函数作为构造函数参数调用时出错 "Variable is not a type name"
- 在 c++11 中为 pthread 设置调度参数
- 为什么我会收到"Run-Time Check Failure #2 - Stack around the variable 'pr' was corrupted"错误?
- 如何在 assert() 和 static_assert() 之间调度,如果在 constexpr 上下文中依赖?
- C++:寻找"returning address of local variable..."的更正
- 如何使用从处理程序调度的最终回调将响应异步返回给调用方on_read?
- C++双重调度
- C++ - 在我尝试制作一个简单的计算器时有一个"uninitialized local variable y used"警告
- Visual Studio Code "variable " u8 的 C/C++ 扩展名 " " 不是类型名称"
- 动态调度到模板函数C++
- C++ "Using Uninitialized Memory.. (variable name) "
- 正确调度消息 UART
- 在 C++ 中使用枚举而不是结构进行标记调度
- 如何实现从 Windows 脚本主机到脚本的事件调度
- C++内置类型的基于类型的调度