快速排序在对降序-升序数据进行排序时的奇怪行为

Quicksort's strange behaviour when sorting descending-ascending data

本文关键字:排序 降序 数据 升序 快速排序      更新时间:2023-10-16

当按升序(100,99,..,0,99,100)对包含数字的数组进行排序时,为什么qs的实现工作如下?:

time for 50000 elements: 0.123 s
time for 100000 elements: 0.288 s
time for 150000 elements: 0.629 s
time for 200000 elements: 0.695 s
time for 250000 elements: 1.652 s
time for 300000 elements: 1.663 s
time for 350000 elements: 3.404 s
time for 400000 elements: 4.185 s
time for 450000 elements: 3.887 s
time for 500000 elements: 6.73 s
time for 550000 elements: 8.887 s
time for 600000 elements: 9.137 s
time for 650000 elements: 11.094 s
time for 700000 elements: 8.436 s
time for 750000 elements: 15.182 s

对于700000个元素,它的工作速度比650000个元素更快。我把测试重复了好几次。这是代码:

    #include <iostream>
    #include <cstdlib>
    #include <time.h>
    #include <new>
    #include <math.h>
    using namespace std;
    inline void swap (int *a, int *b)
    {
        int tmp;
        tmp = *a;
        *a = *b;
        *b = tmp;
    }
    void quick_sort(int tab[], int l, int r);
    int *allocate (int size);
    void dec_inc (int tab[], const int length);
    int main()
    {
        int step = 50000;
        int how_many = 15;
        int k = 1;
        int length = step * how_many;
        int *tab = allocate(length);
        clock_t t2, t1;
        long double dif;
        while (step * k <= length)
        {
            dec_inc(tab, step*k);
            t1 = clock();
            quick_sort(tab, 0, step*k - 1);
            t2 = clock();
            dif = (long double)(t2 - t1)/CLOCKS_PER_SEC;
            cout << "time for " << step * k << " elements: " << dif << " s" << endl;
            k++;
        }
        delete [] tab;
        system("pause");
    }
    int *allocate (int size)
    {
        try
        {
            return new int [size];
        }
        catch(bad_alloc)
        {
            cerr << "ERRORn";
            exit(EXIT_FAILURE);
        }
    }
    void quick_sort(int tab[], int l, int r)
    {
        int v = tab[(l+r)/2];
        int i = l;
        int j = r;
        do
        {
            while(tab[i] < v) i++;
            while(tab[j] > v) j--;
            if (i <= j)
            {
                swap(&tab[i], &tab[j]);
                i++, j--;
            }
        }
        while(i <= j);
        if (l < j)
            quick_sort(tab, l, j);
        if(i < r)
            quick_sort(tab, i, r);
    }
    void dec_inc (int tab[], const int length)
    {
        int i = length/2;
        for (int j = 0; j < length/2; j++, i--)
        {
            tab[j] = i;
        }
        for (int j = length/2; j < length; j++, i++)
        {
            tab[j] = i;
        }
    }

使用快速排序的缺点之一是它的稳定性。某些数据集需要比其他数据集更多的步骤进行排序。对于病理病例,它甚至可以标度为O(n^2)。我测量了quicksort对测试数据执行的比较次数,发现在700000个步骤中,要执行的比较比650000个元素少。尽管您的数据集看起来很相似,但显然对于快速排序来说,它们并不相似。有一些方法可以提高快速排序的稳定性,例如参见Programming Pearls。

以下是测量结果:

650000个元素的时间:4.41251秒。编号比较5061169826

700000个元素的时间:3.37787秒。编号比较3824058435

750000个元素的时间:6.07856秒。编号比较6900645055

这里有相应的代码:gist