微妙的快速排序稳定性问题

Subtle Quicksort Stability Issue

本文关键字：稳定性问题快速排序更新时间：2023-10-16

我正试图为一个学校项目创建一个快速排序实现，该项目对CSV文件进行排序，但很难实现。根据项目规范，按顺序排序CSV文件的每一列将迫使不稳定的排序变得稳定，即："./sort-算法快速排序-k 1,2,3,4,5 input.csv"应产生与"./sort--算法插入-k 1,2,4,5 input.csv"相同的结果

为了保留以前的排序，排序是反向执行的，如下所示：

for (int current_key = config.sort_columns.size()-1; current_key >= 0; current_key--){
    sorting_algorithm(records, config, config.sort_columns[current_key]-1);
}

其中config.sort_columns是-k参数指定的所有排序列的矢量。

这是我的输入：

name,breed,date of birth,date of death,avg eggs per week 
Marilyn,Isa Red,2011-04-24,N/A,6 
Brian,Derp,2010-01-15,2011-12-01,4 
Chrissy,Ent,2012-02-08,N/A,3 
Mildred,Araucana,2011-05-01,N/A,3 
Jimmy,Ent,2006-02-30,N/A,15 
Mabel,Isa Red,2011-04-26,N/A,5 
Myrtle,Araucana,2011-08-01,N/A,0 
Myrtle,Araucana,2011-05-01,2011-07-13,0 
Adam,Ent,2010-01-01,N/A,10 
Isabel,Ent,2009-04-01,N/A,2 
Jimmy,Ent,2006-02-30,2011-03-24,1 
Jimmy,Derp,2003-02-30,2010-03-24,8 
Myrtle,Herp,2011-08-01,N/A,0

以下是我的插入排序的输出（应该是并且看起来是正确的）：

name,breed,date of birth,date of death,avg eggs per week 
Adam,Ent,2010-01-01,N/A,10 
Brian,Derp,2010-01-15,2011-12-01,4 
Chrissy,Ent,2012-02-08,N/A,3 
Isabel,Ent,2009-04-01,N/A,2 
Jimmy,Derp,2003-02-30,2010-03-24,8 
Jimmy,Ent,2006-02-30,2011-03-24,1 
Jimmy,Ent,2006-02-30,N/A,15 
Mabel,Isa Red,2011-04-26,N/A,5 
Marilyn,Isa Red,2011-04-24,N/A,6 
Mildred,Araucana,2011-05-01,N/A,3 
Myrtle,Araucana,2011-05-01,2011-07-13,0 
Myrtle,Araucana,2011-08-01,N/A,0 
Myrtle,Herp,2011-08-01,N/A,0

这是我的快速排序输出：

name,breed,date of birth,date of death,avg eggs per week
Adam,Ent,2010-01-01,N/A,10
Brian,Derp,2010-01-15,2011-12-01,4
Chrissy,Ent,2012-02-08,N/A,3
Isabel,Ent,2009-04-01,N/A,2
Jimmy,Ent,2006-02-30,2011-03-24,1
Jimmy,Ent,2006-02-30,N/A,15
Jimmy,Derp,2003-02-30,2010-03-24,8
Mabel,Isa Red,2011-04-26,N/A,5
Marilyn,Isa Red,2011-04-24,N/A,6
Mildred,Araucana,2011-05-01,N/A,3
Myrtle,Herp,2011-08-01,N/A,0
Myrtle,Araucana,2011-08-01,N/A,0
Myrtle,Araucana,2011-05-01,2011-07-13,0

正如你所看到的，这几乎是正确的，除了当第一列匹配时第二列是错误的（例如"Derp"应该在两个"Ent"之前）。

最后，这里是我的快速排序实现：

int sort_quick_partition(std::vector<Record> &records, bool (*comparison)(string, string), int sort_key, int left, int right){
    /*
    Partition the vector and return the address of the new pivot.
    @param less - Vector of elements less than pivot.
    @param greater - Vector of elements greater than pivot.
    */
    // Setup 
    int store_location;
    Record pivot = records[right];
    Record temp_record;
    // Loop through and partition the vector within the provided bounds
    store_location = left - 1;
    for (int j = left; j < right; j++){
        if (comparison(records[j].fields[sort_key],pivot.fields[sort_key])){
            store_location += 1;
            std::swap(records[store_location], records[j]);
        }
    }
    std::swap(records[store_location+1], records[right]);
    return store_location+1;
}
void sort_quick_helper(std::vector<Record> &records, bool (*comparison)(string, string), int sort_key, int left, int right){
    /*
    Actually performs the quick sort.
    @param sub_list - Vector to sort.
    @param *comparison - Comparison to perform.
    @param sort_key - Which column to sort by.
    @param left - Left side of active sort zone.
    @param right - Right side of active sort zone.
    */
    // Setup
    int new_pivot;
    // Make sure the list has 2 or more items
    if (left < right){
        // Partition the vector and get the new pivot value
        new_pivot = sort_quick_partition(records, comparison, sort_key, left, right);
        // Sort elements less than the pivot
        sort_quick_helper(records, comparison, sort_key, left, new_pivot-1);
        // Sort elements greater than the pivot
        sort_quick_helper(records, comparison, sort_key, new_pivot+1, right);
    }
}
void sort_quick(std::vector<Record> &records, Configuration &config, int sort_key){
    /*
    Perform a quick sort on the records.
    @param &records - Vector of Record structures representing the file.
    @param &config - Reference to a global configuration structure.
    @param sort_key - Which column to sort by.
    */
    // Decide if it needs to be reversed
    bool (*comparison)(string, string);
    if (config.reverse){
        comparison = &comparison_gte;
    } else {
        comparison = &comparison_lte;
    }
    // Call the sort
    sort_quick_helper(records, comparison, sort_key, 0, records.size()-1);
}

请注意，"sorting_agorithm"是一个指向活动排序的函数指针，在本例中为"sort_quick"。

有人看到可能出了什么问题吗？我已经试了好几天了，现在我要把头发拔出来了。谢谢大家！

通过重复排序可以从不稳定的排序中进行稳定排序，这是不正确的。考虑最后一种排序：当它看到相等的键时，不能保证保留以前的排序（这就是不稳定的含义）。

相反，您需要对一个明确排序输入的键进行排序，因此您需要进行一次排序，其中您排序的键是所有列，而不是单个列。

因此，当您比较记录"Myrtle，Araucana，2011-08-01，N/A，0"answers"Myrttle，Araucna，2011-05-012011-07-13,0"时，您需要按顺序比较字段，直到找到一对不相等的字段。（这被称为字典排序。）如果你需要保持完全相等记录的顺序，你甚至可能需要合并原始位置。

当然，如果这不是家庭作业，你可能会看到std::stable_sort。（以相反的顺序对列进行稳定排序是可以的。）

好吧，您的排序看起来很稳定，因为您选择了pivot作为right最多的元素。然而，在最后一行。

std::swap(records[store_location+1], records[right]);

就是这样交换两条记录，即使它们是相等的。添加一个仅在它们不相等时进行排序的检查：

// You'll probably use your comparison() function here.
if ( records[store_location+1].fields[sort_key] != records[right].fields[sort_key] ) {
    std::swap(records[store_location+1], records[right]);
}