是否有 C++/C 的高效构建功能可以快速均匀地对 b 条目进行采样,而无需替换 n 个条目?

Any efficient building function in C++/C to fast uniformly sample b entries without replacement from n entries?

本文关键字:采样 替换 高效 C++ 构建 功能 是否      更新时间:2023-10-16

似乎,在获取第一个b条目之前使用shuffle(Index, Index+n, g)仍然效率不高,因为n非常大但b非常小,其中 Index 是存储0 ... (n-1)的向量/数组。

您可以采用标准的随机播放算法并将其修改为在仅随机播放前b个条目后停止。

这是一个与 sh1 建议的不同的想法。 与之前的答案不同,它在数学上是合理的。没有 % BS。

#include <algorithm>
#include <iostream>
#include <iterator>
#include <ostream>
#include <random>
#include <set>
#include <vector>
int main()
{
// Configuration.
auto constexpr n = 100;
auto constexpr b = 10;
// Building blocks for randomness.
auto const seed = std::mt19937::result_type{ std::random_device{}() };
std::cout << "seed = " << seed << std::endl;
auto engine = std::mt19937{ seed };
auto distribution = std::uniform_int_distribution<>{};
// Creating the data source. Not part of the solution.
auto reservoir = std::vector<int>{};
std::generate_n(std::back_inserter(reservoir), n, 
[i = 0]() mutable { return i++; });
// Creating the sample.
// Idea attributed to Bob Floyd by Jon Bentley in Programming Pearls 2nd Edition.
auto sample = std::set<int>{};
for (auto i = n - b; i != n; ++i)
{
auto const param = std::uniform_int_distribution<>::param_type(0, i);
auto const j = distribution(engine, param);
(sample.find(j) == sample.cend()) 
? sample.insert(j) 
: sample.insert(i);
}
// Converting the sample to an output vector and shuffle it, if necessary.
auto output = std::vector<int>(std::cbegin(sample), std::cend(sample));
std::shuffle(std::begin(output), std::end(output), engine);
// Print out the result.
for (auto const x : output) { std::cout << x << " "; }
std::cout << std::endl;
return 0;
}