为什么我的二分查找需要一个额外的比较?log2 (N) + 1

Why does my binary search need an extra comparison? log2(N)+1

本文关键字：比较 log2 一个二分查找我的为什么更新时间：2023-10-16

我想在整数数组中找到第一个整数的索引，该整数为<= key。我可以在log2(N)+1次比较中进行二分查找。难道不可能只有log2(N)个比较吗?

// Returns the index of the first integer in keys <= key. size must be a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;
    for (int i = 0; i < log2(size); i++) {
        size /= 2;
        if (keys[size] < key)
            keys += size;
    }
    unsigned i = unsigned(keys - origKeys);
    // Not only is this step messy, because it doesn't fit the loop, but
    // now we have log2(n) + 1 comparisons.
    if (*keys < key)
        i++;
    return i;
}

让我们从信息论的角度来考虑这个问题。如果您有一个包含n个元素的数组，则有n+1个可能的位置可以放置新元素:在数组的任何元素之前，或在所有元素之后。因此，您的算法需要进行足够的比较，以便能够唯一地识别n+1个位置中哪一个是正确的。如果你没有做足够的比较，你给出的答案就不会总是正确的。

在最好的情况下，你所做的每一次比较都可以消除一半的可能位置。因此，在理论极限下，通过k次比较，你可以决定2^k个位置中哪一个是正确的。因为有n+1个可能的位置，在最坏的情况下，你需要进行lg (n+1)次比较，而不是lg n次。因为你的n是2的完美幂，这意味着需要进行一次额外的比较。另一方面，如果n比2的完全幂小1，那么进行ceil(lgn)比较就足够了。

由Eloff编辑，此代码似乎给出了log2(n+1)步的正确答案，如您所预测的:

// Returns the index of the first integer in keys <= key. size must be one less than a power of 2.
unsigned find(int key, int* keys, unsigned size) {    
    int* origKeys = keys;
    size++;
    while(size > 1) {
        size /= 2;
        if (keys[size-1] < key)
            keys += size;
    }
    return unsigned(keys - origKeys);        
}