字符串的递归二进制搜索-C++

Recursive binary search for a string - C++

本文关键字：搜索 -C++ 二进制递归字符串更新时间：2023-10-16

我正在尝试实现函数findMatchesInDict，该函数试图查看一个单词是否与预先排序的词典中的任何单词匹配。以下是我目前的实现：

void findMatchesInDict(string word, int start, const string dict[], int end, string results[], int& totalResults)
{
// initial start = 0 index
// initial end = last index of dict array
int middle = start + (end - start) / 2;
if (end < start)
return;
if (word == dict[middle]) // if we found a match
storeUniqueMatches(word, 0, results, totalResults); 
else if (word < dict[middle])
findMatchesInDict(word, start, dict, middle - 1, results, totalResults);
else
findMatchesInDict(word, middle + 1, dict, end, results, totalResults);
}

storeUniqueMatches函数工作正常(这只是将匹配的单词存储到results数组中，确保不会存储重复的单词

该功能将只匹配字典中的选定单词，而不匹配其他单词。

关于为什么这可能无法正常工作，有什么想法吗？

作为参考，此实现有效，但效率极低，并会导致堆栈溢出错误。

void findMatchesInDict(string word, int start, const string dict[], int end, string results[], int& totalResults)
{
if (start > end) 
return;
if (word == dict[start]) // if we found a match
storeUniqueMatches(word, 0, results, totalResults);
findMatchesInDict(word, start + 1, dict, size, results, totalResults);
}

我仍然相信OP犯了一个1比1的错误。

我强烈怀疑

findMatchesInDict(word, start, dict, middle - 1, results, totalResults);

应该是

findMatchesInDict(word, start, dict, middle, results, totalResults);

我自己做了一个小样品。(因此，我重新设计了一点代码，因为我对OP的表现感到不走运。(

#include <iostream>
#include <string>
size_t find(const std::string &word, const std::string dict[], size_t i0, size_t size)
{
if (!size) return (size_t)-1; // bail out with invalid index
const size_t i = i0 + size / 2;
return word == dict[i]
? i
: word < dict[i]
? find(word, dict, i0, i - i0)
: find(word, dict, i + 1, i0 + size - (i + 1));
}
int main()
{
const std::string dict[] = {
"Ada", "BASIC", "C", "C++",
"D", "Haskell", "INTERCAL", "Modula2",
"Oberon", "Pascal", "Scala", "Scratch",
"Vala"
};
const size_t sizeDict = sizeof dict / sizeof *dict;
unsigned nErrors = 0;
// brute force tests to find something what is in
for (size_t n = 1; n <= sizeDict; ++n) {
for (size_t i = 0; i < n; ++i) {
if (find(dict[i], dict, 0, n) >= n) {
std::cerr << "ALERT! Unable to find entry " << i << " in " << n << " entries!n";
++nErrors;
}
}
}
// brute force tests to find something what is not in
for (size_t n = 1; n <= sizeDict; ++n) {
if (find("", dict, 0, n) < n) {
std::cerr << "ALERT! Able to find entry '' in " << n << " entries!n";
++nErrors;
}
for (size_t i = 0; i < n; ++i) {
if (find(dict[i] + " + Assembler", dict, 0, n) < n) {
std::cerr << "ALERT! Able to find entry '" << dict[i] << " + Assembler' in " << n << " entries!n";
++nErrors;
}
}
}
// report
if (!nErrors) std::cout << "All tests passed OK.n";
else std::cerr << nErrors << " tests failed!n";
// done
return nErrors > 0;
}

coliru上的实时演示

这些代码中大部分是暴力测试代码：

对dict从1到大小的每个长度进行了测试。对于每个长度，搜索dict的任何条目。
对CCD_ 6从1到大小的每个长度进行了测试。对于每个长度，测试空字符串(在任何其他条目之前(以及任何经过修改的条目。(修改授权它将在未修改的条目和其后续条目之间，或在最后一个条目之后。(

输出：

All tests passed OK.

一切顺利。

然后我换了

find(word, dict, i0, i - i0)

带有

find(word, dict, i0, i - i0 > 0 ? i - i0 - 1 : 0)

类似于(在我看来(OP代码的错误。

输出：

ALERT! Unable to find entry 0 in 2 entries!
ALERT! Unable to find entry 0 in 3 entries!
ALERT! Unable to find entry 1 in 4 entries!
ALERT! Unable to find entry 1 in 5 entries!
ALERT! Unable to find entry 3 in 5 entries!
ALERT! Unable to find entry 0 in 6 entries!
ALERT! Unable to find entry 2 in 6 entries!
ALERT! Unable to find entry 4 in 6 entries!
ALERT! Unable to find entry 0 in 7 entries!
ALERT! Unable to find entry 2 in 7 entries!
ALERT! Unable to find entry 4 in 7 entries!
ALERT! Unable to find entry 0 in 8 entries!
ALERT! Unable to find entry 3 in 8 entries!
ALERT! Unable to find entry 5 in 8 entries!
ALERT! Unable to find entry 0 in 9 entries!
ALERT! Unable to find entry 3 in 9 entries!
ALERT! Unable to find entry 6 in 9 entries!
ALERT! Unable to find entry 1 in 10 entries!
ALERT! Unable to find entry 4 in 10 entries!
ALERT! Unable to find entry 7 in 10 entries!
ALERT! Unable to find entry 1 in 11 entries!
ALERT! Unable to find entry 4 in 11 entries!
ALERT! Unable to find entry 7 in 11 entries!
ALERT! Unable to find entry 9 in 11 entries!
ALERT! Unable to find entry 1 in 12 entries!
ALERT! Unable to find entry 3 in 12 entries!
ALERT! Unable to find entry 5 in 12 entries!
ALERT! Unable to find entry 8 in 12 entries!
ALERT! Unable to find entry 10 in 12 entries!
ALERT! Unable to find entry 1 in 13 entries!
ALERT! Unable to find entry 3 in 13 entries!
ALERT! Unable to find entry 5 in 13 entries!
ALERT! Unable to find entry 7 in 13 entries!
ALERT! Unable to find entry 9 in 13 entries!
ALERT! Unable to find entry 11 in 13 entries!
35 tests failed!

嗯。事实上，这并不能证明任何关于OP.的代码

然而，这显示

"off by 1"可以从本质上打破二进制搜索。
如何设计强力测试来发现此类错误。

因此，这将有望帮助OP自己发现算法中的错误(这对他来说实际上更有价值(。