把最胖的人扔出超载的飞机

Throwing the fattest people off of an overloaded airplane.

本文关键字：超载飞机更新时间：2023-10-16

假设你有一架飞机，它的燃料很低。除非飞机减少3000磅的乘客重量，否则它将无法到达下一个机场。为了尽可能多地挽救生命，我们想先把最重的人扔出飞机。

哦，是的，飞机上有数百万人，我们想要一个最佳算法来找到最重的乘客，而不必对整个列表进行排序。

这是一个代理问题的东西，我试图在c++中编码。我想按重量对旅客清单进行"partial_sort"，但我不知道需要多少元素。我可以实现我自己的"partial_sort"算法("partial_sort_accumulate_until")，但我想知道是否有更简单的方法使用标准STL来做到这一点。

这对您的代理问题没有帮助，但是:

对于1,000,000名乘客要减少3,000磅体重，每位乘客必须每人减少(3,000/1000000)= 0.003磅。这可以通过扔掉每个人的衬衫、鞋子，甚至是剪下来的指甲来实现，从而拯救每个人。这是假设在飞机消耗更多燃料所需的减重增加之前，有效的收集和丢弃。

事实上，他们不允许指甲钳上飞机了，所以那是不允许的。

一种方法是使用最小堆(c++中的std::priority_queue)。假设你上的是MinHeap课，你会这样做。(是的，我的例子是在c#中。我想你已经明白了。

int targetTotal = 3000;
int totalWeight = 0;
// this creates an empty heap!
var myHeap = new MinHeap<Passenger>(/* need comparer here to order by weight */);
foreach (var pass in passengers)
{
    if (totalWeight < targetTotal)
    {
        // unconditionally add this passenger
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
    else if (pass.Weight > myHeap.Peek().Weight)
    {
        // If this passenger is heavier than the lightest
        // passenger already on the heap,
        // then remove the lightest passenger and add this one
        var oldPass = myHeap.RemoveFirst();
        totalWeight -= oldPass.Weight;
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
}
// At this point, the heaviest people are on the heap,
// but there might be too many of them.
// Remove the lighter people until we have the minimum necessary
while ((totalWeight - myHeap.Peek().Weight) > targetTotal)
{
    var oldPass = myHeap.RemoveFirst();
    totalWeight -= oldPass.Weight; 
}
// The heap now contains the passengers who will be thrown overboard.

根据标准参考，运行时间应与n log k成正比，其中n为乘客数，k为堆上的最大项目数。如果我们假设乘客的体重通常为100磅或更多，那么在任何时候，堆中不太可能包含超过30件物品。

最坏的情况是乘客按体重从轻到高的顺序出现。这需要将每个乘客都添加到堆中，并从堆中删除每个乘客。不过，考虑到有100万名乘客，假设最轻的车重100磅，n log k的重量还是相当小的。

如果你随机获得乘客的权重，性能会好得多。我在推荐引擎中使用了类似的东西(我从数百万个列表中选择前200个项目)。我通常只会将50,000或70,000个项目实际添加到堆中。

我怀疑你会看到类似的情况:你的大多数候选人会被拒绝，因为他们比堆上最轻的人更轻。Peek是一个O(1)操作。

有关堆选择和快速选择性能的更多信息，请参见理论与实践的结合。简而言之:如果您选择的项目少于总数的1%，那么堆选择明显优于快速选择。超过1%，则使用快速选择或像Introselect这样的变体。

下面是直接解决方案的一个相当简单的实现。我不认为有更快的方法是100%正确的。

size_t total = 0;
std::set<passenger> dead;
for ( auto p : passengers ) {
    if (dead.empty()) {
       dead.insert(p);
       total += p.weight;
       continue;
    }
    if (total < threshold || p.weight > dead.begin()->weight)
    {
        dead.insert(p);
        total += p.weight;
        while (total > threshold)
        {
            if (total - dead.begin()->weight < threshold)
                break;
            total -= dead.begin()->weight;
            dead.erase(dead.begin());
        }
    }
 }

这是通过填充"死人"集直到达到阈值来工作的。一旦达到了临界值，我们就继续检查乘客名单找出比最轻的死者还重的人。当我们找到一个人时，我们把他们添加到列表中，然后开始"拯救"列表中最轻的人，直到我们无法拯救更多的人。

在最坏的情况下，这将执行与整个列表排序相同的操作。但在最好的情况下("死亡名单"由前X人填满)，它将执行O(n) .

假设所有乘客都将合作:使用并行排序网络。(参见this)

~~这是一个现场演示~~

更新:备选视频(跳到1:00)

让一对对人进行比较交换——没有比这更快的了。

@Blastfurnace在正确的轨道上。您使用快速选择，其中枢轴是权重阈值。每个分区将一组人分成若干组，并返回每组人的总权重。你继续打破相应的桶，直到你的桶对应的最高重量的人超过3000磅，你的最低的桶在这个集合中有一个人(也就是说，它不能再分割了。)

该算法是线性时间平摊的，但最坏情况是二次的。我认为这是唯一的线性时间算法。

下面是演示该算法的Python解决方案:

#!/usr/bin/env python
import math
import numpy as np
import random
OVERWEIGHT = 3000.0
in_trouble = [math.floor(x * 10) / 10
              for x in np.random.standard_gamma(16.0, 100) * 8.0]
dead = []
spared = []
dead_weight = 0.0
while in_trouble:
    m = np.median(list(set(random.sample(in_trouble, min(len(in_trouble), 5)))))
    print("Partitioning with pivot:", m)
    lighter_partition = []
    heavier_partition = []
    heavier_partition_weight = 0.0
    in_trouble_is_indivisible = True
    for p in in_trouble:
        if p < m:
            lighter_partition.append(p)
        else:
            heavier_partition.append(p)
            heavier_partition_weight += p
        if p != m:
            in_trouble_is_indivisible = False
    if heavier_partition_weight + dead_weight >= OVERWEIGHT and not in_trouble_is_indivisible:
        spared += lighter_partition
        in_trouble = heavier_partition
    else:
        dead += heavier_partition
        dead_weight += heavier_partition_weight
        in_trouble = lighter_partition
print("weight of dead people: {}; spared people: {}".format(
    dead_weight, sum(spared)))
print("Dead: ", dead)
print("Spared: ", spared)

输出:

Partitioning with pivot: 121.2
Partitioning with pivot: 158.9
Partitioning with pivot: 168.8
Partitioning with pivot: 161.5
Partitioning with pivot: 159.7
Partitioning with pivot: 158.9
weight of dead people: 3051.7; spared people: 9551.7
Dead:  [179.1, 182.5, 179.2, 171.6, 169.9, 179.9, 168.8, 172.2, 169.9, 179.6, 164.4, 164.8, 161.5, 163.1, 165.7, 160.9, 159.7, 158.9]
Spared:  [82.2, 91.9, 94.7, 116.5, 108.2, 78.9, 83.1, 114.6, 87.7, 103.0, 106.0, 102.3, 104.9, 117.0, 96.7, 109.2, 98.0, 108.4, 99.0, 96.8, 90.7, 79.4, 101.7, 119.3, 87.2, 114.7, 90.0, 84.7, 83.5, 84.7, 111.0, 118.1, 112.1, 92.5, 100.9, 114.1, 114.7, 114.1, 113.7, 99.4, 79.3, 100.1, 82.6, 108.9, 103.5, 89.5, 121.8, 156.1, 121.4, 130.3, 157.4, 138.9, 143.0, 145.1, 125.1, 138.5, 143.8, 146.8, 140.1, 136.9, 123.1, 140.2, 153.6, 138.6, 146.5, 143.6, 130.8, 155.7, 128.9, 143.8, 124.0, 134.0, 145.0, 136.0, 121.2, 133.4, 144.0, 126.3, 127.0, 148.3, 144.9, 128.1]

假设，像人们的权重一样，您很好地了解最大和最小值可能是什么，使用基数排序在O(n)内对它们进行排序。然后简单地从列表中最重的那一端开始到最轻的那一端。总运行时间:O(n)。不幸的是，STL中没有实现基数排序，但编写起来非常简单。

为什么不使用与"sorted"不同的中止规则的部分快速排序呢?你可以运行它，然后只使用高的那一半继续下去，直到这高的那一半的权重不再包含至少要被丢弃的权重，然后你在递归中往回走一步，对列表排序。在那之后，你可以开始把人从排序列表的高端中剔除。

大规模并行锦标赛排序:-

假设飞机两边各有三个座位:-

要求靠窗座位的乘客，如果他们比靠窗座位的人重，请移到中间座位。
如果中间座位的乘客较重，请与靠过道座位的乘客交换。
如果左通道座位的乘客较重，请将右通道座位的乘客与左通道座位的乘客交换。
将乘客排在右边过道座位。(n行需要n步)——要求坐在过道右边的乘客与前面的人交换n -1次。

5把他们踢出门外，直到你达到3000磅。

3步+ n步再加上30步如果你的乘客非常瘦的话

对于双通道飞机——指令更复杂，但性能大致相同。

我可能会使用std::nth_element在线性时间内划分出20个最重的人。然后用一种更复杂的方法找到并干掉最重的人。

你可以一次遍历这个列表，得到平均值和标准差，然后用它来估计必须离开的人数。使用partial_sort基于该数字生成列表。如果猜测值较低，则对剩余部分再次使用partial_sort并进行新的猜测。

@James在评论中有答案:std::priority_queue，如果你可以使用任何容器，或者std::make_heap和std::pop_heap(和std::push_heap)的组合，如果你想使用类似std::vector的东西。

这是一个使用Python内置heapq模块的基于堆的解决方案。它是在Python中，所以没有回答最初的问题，但它比其他发布的Python解决方案更干净(恕我直言)。

import itertools, heapq
# Test data
from collections import namedtuple
Passenger = namedtuple("Passenger", "name seat weight")
passengers = [Passenger(*p) for p in (
    ("Alpha", "1A", 200),
    ("Bravo", "2B", 800),
    ("Charlie", "3C", 400),
    ("Delta", "4A", 300),
    ("Echo", "5B", 100),
    ("Foxtrot", "6F", 100),
    ("Golf", "7E", 200),
    ("Hotel", "8D", 250),
    ("India", "8D", 250),
    ("Juliet", "9D", 450),
    ("Kilo", "10D", 125),
    ("Lima", "11E", 110),
    )]
# Find the heaviest passengers, so long as their
# total weight does not exceeed 3000
to_toss = []
total_weight = 0.0
for passenger in passengers:
    weight = passenger.weight
    total_weight += weight
    heapq.heappush(to_toss, (weight, passenger))
    while total_weight - to_toss[0][0] >= 3000:
        weight, repreived_passenger = heapq.heappop(to_toss)
        total_weight -= weight

if total_weight < 3000:
    # Not enough people!
    raise Exception("We're all going to die!")
# List the ones to toss. (Order doesn't matter.)
print "We can get rid of", total_weight, "pounds"
for weight, passenger in to_toss:
    print "Toss {p.name!r} in seat {p.seat} (weighs {p.weight} pounds)".format(p=passenger)

如果k =需要抛掷的乘客数，N =乘客数，则该算法的最佳情况为O(N)，最坏情况为Nlog(N)。最坏的情况发生在k长时间接近N时。下面是一个最差的cast的例子:

weights = [2500] + [1/(2**n+0.0) for n in range(100000)] + [3000]

然而，在这种情况下(把人从飞机上扔下去(我猜是用降落伞))，那么k必须小于3000，也就是<<"数百万人"。因此，平均运行时间应该是Nlog(k)左右，它与人数成线性关系。