随机数外排序
Random numbers external sort
我需要编写一个生成n随机数的程序,然后按降序写入二进制文件。应该在不使用任何使用主内存的分类算法的情况下完成。这就是我到目前为止所做的:
#include <iostream>
#include <fstream>
#include <ctime>
#include <cstdlib>
using namespace std;
int main () {
srand(time(0));
rand();
int N;
do{
cout << "Unesite N: ";
cin >> N;
} while(N<=0);
ofstream br("broj.dat", ios::binary | ios::trunc);
for(int i = 0; i<N; i++){
int a = rand();
br.write((char *)&a, sizeof(a));
}
br.close();
return 0;
}
因此,我已经生成了随机数,并将它们写入二进制文件,但我不知道如何对其进行排序。
您可以在线性时间以排序顺序生成数字。描述如何执行此操作的论文是:生成Bentley&amp;萨克斯
https://pdfs.semanticscholar.org/2dbc/4e3f10b88888832fcd5fb8888888d34b8fb0b0b0102000.pdf
/**
* Generate an sorted list of random numbers sorted from 1 to 0, given the size
* of the list being requested.
*
* This is an implementation of an algorithm developed by Bentley and Sax, and
* published in in ACM Transactions on Mathematical Software (v6, iss3, 1980) on
* 'Generating Sorted Lists of Random Numbers'.
*/
public class SortedRandomDoubleGenerator {
private long valsFound;
private double curMax;
private final long numVals;
/**
* Instantiate a generator of sorted random doubles.
*
* @param numVals the size of the list of sorted random doubles to be
* generated
*/
public SortedRandomDoubleGenerator(long numVals) {
curMax = 1.0;
valsFound = 0;
this.numVals = numVals;
}
/**
* @return the next random number, in descending order.
*/
public double getNext() {
curMax = curMax
* Math.pow(Math.E, Math.log(RandomNumbers.nextDouble())
/ (numVals - valsFound));
valsFound++;
return curMax;
}
}
这是我如何做的伪代码。
for i in 1..N:
write rand() to new file
push onto file stack (new file, size=1)
while 2 < len(file stack) and size of top two files the same:
pop top two and merge them
push onto file stack (merged file, size=new size)
while 2 < len(file stack):
pop top two and merge them
push onto file stack (merged file, size=new size)
The top of the file stack is your new sorted file.
标准库具有合并排序,但是您需要使用随机访问迭代器。如果您可以使用mmap
(或其等效),则具有随机的访问迭代器(是的,我知道您需要从命令行中服用COUNT
):
#include <algorithm>
#include <cstdio>
#include <random>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
const size_t COUNT = 4096 * 4096;
int main()
{
// create file (using mmap for simplicity)
int fd = open("out.dat", O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
if (fd < 0) {
std::perror("open failed");
return 1;
}
if (ftruncate(fd, COUNT * sizeof(unsigned)) != 0) {
std::perror("ftruncate failed");
close(fd);
return 1;
}
void* mm = mmap(nullptr, COUNT * sizeof(unsigned), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (mm == MAP_FAILED) {
std::perror("mmap failed");
close(fd);
return 1;
}
close(fd);
// populate file
unsigned* begin = static_cast<unsigned*>(mm);
std::default_random_engine rng((std::random_device())());
std::generate_n(begin, COUNT, rng);
msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
std::puts("file written");
// sort file
std::stable_sort(begin, begin + COUNT);
msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
std::puts("file sorted");
if (std::is_sorted(begin, begin + COUNT)) {
std::puts("it's properly sorted");
}
// close file
munmap(mm, COUNT * sizeof(unsigned));
return 0;
}
实际上不需要msync
调用。老实说,我很惊讶这表现不错。