对数组的一部分进行对象搜索

Object search on part of the array

本文关键字:对象 搜索 一部分 数组      更新时间:2023-10-16

我有这样的对象:

class search_object
{
public:
    unsigned int index; // 0 <= index <= 50000
    unsigned int search_field; // 1 <= search_field <= 5000000000, can be duplicates!
};

我有大约50000这样类型的对象。这些对象按索引排序。

我的程序得到这样的搜索查询:"有没有一些对象,其索引在left_indexright_indexleft_index <= index <= right_index)之间,并且search_field等于Numbersearch_field == Number)。"

大约有50000查询。

我有解决方案,但它对我的上下文系统来说很慢。

我的算法是:

  1. search_field对搜索对象进行排序
  2. 查找lower_index,其中search_object[lower_index] = Numberlower_bound()函数,它是二进制搜索)
  3. 在对象数组上从lower_index迭代到数组的末尾。如果this_objectleft_indexright_index之间具有index,则true。否则,false
  4. 对所有搜索查询(left_index、right_index和Number)重复步骤2-3

我会使用标准容器,我不会将对象本身用作键。代码示例:

std::map<decltype(search_object::index), search_object> container;
...
auto itr = container.lower_bound(left_index);
while (itr != container.end() && itr->first <= right_index)
  if (itr->second == Number) return itr;
return container.end();

您可以使用map<unsigned int, list<search_object*>>,它为每个search_field保存对象列表。

在搜索之前使用正确的排序策略可以轻松完成。

#include <iostream>     // std::cout
#include <algorithm>    // std::lower_bound, std::upper_bound, std::sort
#include <vector>       // std::vector
class search_object
{
public:
    uint64_t index; // 0 <= index <= 50000
    uint64_t search_field; // 1 <= search_field <= 5000000000, can be duplicates!
};
search_object data[] = { { 13, 54345632 }, { 42, 4645347 }, { 63, 4645347 }, { 117, 4674534536 } };
using table = std::vector<search_object>;
using itr = table::const_iterator;
using range = std::pair<itr, itr>;
template<typename Pred>
range FindRange(const table& vec, Pred pred, search_object lowValue, search_object highValue) {
  // concept vec is sorted by pred
  //assert(pred(lowValue, highValue)); // paranoid check
  itr low=std::lower_bound (vec.begin(), vec.end(), lowValue, pred); //
  itr up= std::upper_bound (low,         vec.end(), highValue, pred); // no reason to search before low!
  return range(low, up);
}
int main () {
  std::vector<search_object> FldTable(std::begin(data),std::end(data));
  // sort by primary key field and secondary index.
  auto CmpFld = [] (const search_object& obj1, const search_object& obj2) {
    return obj1.search_field < obj2.search_field || // primary sort
           ((obj1.search_field == obj2.search_field) && // secondary
        (obj1.index < obj2.index)
       );
  };
  // sort after field
  std::sort (FldTable.begin(), FldTable.end(), CmpFld); // dublicates possible.
  // some "random" search values
  unsigned int lowSearchIndex = 10, highSearchIndex = 100, searchField = 4645347;
  search_object low { lowSearchIndex,  searchField };
  search_object up  { highSearchIndex, searchField };
  range search = FindRange(FldTable, CmpFld, low, up);
  for (itr record = search.first; record < search.second; ++record)
     std::cout << "index = " << record->index << " field = " << record-> search_field << "n";
  return 0;
}

输出

index = 42 field = 4645347
index = 63 field = 4645347

std::upper_bound返回一个迭代器,该迭代器指向范围[first,last)中的第一个元素,该元素的比较值大于值。