随机数外排序

Random numbers external sort

本文关键字:外排序 随机数      更新时间:2023-10-16

我需要编写一个生成n随机数的程序,然后按降序写入二进制文件。应该在不使用任何使用主内存的分类算法的情况下完成。这就是我到目前为止所做的:

#include <iostream>
#include <fstream> 
#include <ctime>
#include <cstdlib>
using namespace std;
int main () {
  srand(time(0));
  rand();
  int N;
  do{
    cout << "Unesite N: ";
    cin >> N;
    } while(N<=0);
  ofstream br("broj.dat", ios::binary | ios::trunc);
  for(int i = 0; i<N; i++){
    int a = rand();
    br.write((char *)&a, sizeof(a));
  }
  br.close();
  return 0;
}

因此,我已经生成了随机数,并将它们写入二进制文件,但我不知道如何对其进行排序。

您可以在线性时间以排序顺序生成数字。描述如何执行此操作的论文是:生成Bentley&amp;萨克斯

https://pdfs.semanticscholar.org/2dbc/4e3f10b88888832fcd5fb8888888d34b8fb0b0b0102000.pdf

/**
 * Generate an sorted list of random numbers sorted from 1 to 0, given the size
 * of the list being requested.
 * 
 * This is an implementation of an algorithm developed by Bentley and Sax, and
 * published in in ACM Transactions on Mathematical Software (v6, iss3, 1980) on
 * 'Generating Sorted Lists of Random Numbers'.
 */
public class SortedRandomDoubleGenerator {
    private long       valsFound;
    private double     curMax;
    private final long numVals;
    /**
     * Instantiate a generator of sorted random doubles.
     * 
     * @param numVals the size of the list of sorted random doubles to be
     *        generated
     */
    public SortedRandomDoubleGenerator(long numVals) {
        curMax = 1.0;
        valsFound = 0;
        this.numVals = numVals;
    }
    /**
     * @return the next random number, in descending order.
     */
    public double getNext() {
        curMax = curMax
                * Math.pow(Math.E, Math.log(RandomNumbers.nextDouble())
                        / (numVals - valsFound));
        valsFound++;
        return curMax;
    }
}

这是我如何做的伪代码。

for i in 1..N:
    write rand() to new file
    push onto file stack (new file, size=1)
    while 2 < len(file stack) and size of top two files the same:
        pop top two and merge them
        push onto file stack (merged file, size=new size)
while 2 < len(file stack):
    pop top two and merge them
    push onto file stack (merged file, size=new size)
The top of the file stack is your new sorted file.

标准库具有合并排序,但是您需要使用随机访问迭代器。如果您可以使用mmap(或其等效),则具有随机的访问迭代器(是的,我知道您需要从命令行中服用COUNT):

#include <algorithm>
#include <cstdio>
#include <random>
#include <fcntl.h>
#include <sys/mman.h>
#include <unistd.h>
const size_t COUNT = 4096 * 4096;
int main()
{
    // create file (using mmap for simplicity)
    int fd = open("out.dat", O_RDWR | O_TRUNC | O_CREAT, S_IRUSR | S_IWUSR);
    if (fd < 0) {
        std::perror("open failed");
        return 1;
    }
    if (ftruncate(fd, COUNT * sizeof(unsigned)) != 0) {
        std::perror("ftruncate failed");
        close(fd);
        return 1;
    }
    void* mm = mmap(nullptr, COUNT * sizeof(unsigned), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    if (mm == MAP_FAILED) {
        std::perror("mmap failed");
        close(fd);
        return 1;
    }
    close(fd);
    // populate file
    unsigned* begin = static_cast<unsigned*>(mm);
    std::default_random_engine rng((std::random_device())());
    std::generate_n(begin, COUNT, rng);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file written");
    // sort file
    std::stable_sort(begin, begin + COUNT);
    msync(mm, COUNT * sizeof(unsigned), MS_SYNC);
    std::puts("file sorted");
    if (std::is_sorted(begin, begin + COUNT)) {
        std::puts("it's properly sorted");
    }
    // close file
    munmap(mm, COUNT * sizeof(unsigned));
    return 0;
}

实际上不需要msync调用。老实说,我很惊讶这表现不错。