使用rtree(或任何其他算法）计算向量中组的频率

count the frequency of groups in vector using rtree ( or any other algorithm )

本文关键字：向量计算频率算法 rtree 任何其他使用更新时间：2023-10-16

给定向量中的以下点集{（100，150），（101，152），（102，151），（105，155），（50，50），（51，55），（55，55）（150，250），（190，260）}

我需要识别相邻点及其计数。假设可接受的距离已设置为5。现在我需要以下输出：5个单位中的点（100150）的频率为4。5个单位中的点（50，50）的频率为3点（150250）在5个单位内的频率为15个单位内的点（190260）的频率为1

我尝试了一个RTree解决方案来解决这个问题，但无法确定排除所有相邻点作为候选点的逻辑。意味着一旦我确定了（100150）有四个邻居，我就不想确定这些邻居的邻居。我想转到下一个值。以下是假设：1.效率是最关心的问题2.矢量未排序3.矢量可以包含数千个点。我正在使用C++和boost实现RTree。请指导我如何实现解决方案

这是代码后面的代码，它计算向量中唯一点的邻居数量。一旦确定某一点的邻居，我需要指导他们排除在外。

       include set, iostream, boost/geometry.hpp,       boost/geometry/geometries/point.hpp, boost/geometry/index/rtree.hpp
      using namespace std;
      namespace bg = boost::geometry;
      namespace bgi = boost::geometry::index;
     typedef bg::model::point<int, 2, bg::cs::cartesian> point;
     typedef std::pair<point, unsigned> value;
    struct ltstr
    {
       bool operator()(const point &p1, const point &p2) const
    {
        return (p1.get < 0 >() < p2.get < 0 >() || p1.get < 1 >() < p2.get < 1 >());
}
   };

       void main()
      {
vector<point> candidatePoints{ point(457, 184), point(457, 184), point(457, 184), point(457, 184), point(457, 184),
    point(456, 184), point(456, 184), point(456, 184), point(456, 184), point(456, 184),
    point(456, 184), point(457, 184), point(457, 184), point(457, 184), point(458, 184), point(459, 185) };
bgi::rtree< value, bgi::quadratic<16> > rtree;
set<point, ltstr> uniqueCandidatePoints;
for (int i = 0; i < candidatePoints.size(); ++i)
{
    int x = candidatePoints[i].get < 0 >();
    int y = candidatePoints[i].get < 1 >();
    uniqueCandidatePoints.insert(point(x, y));
    rtree.insert(make_pair(candidatePoints[i], i));
}
for (auto it = uniqueCandidatePoints.begin(); it != uniqueCandidatePoints.end(); ++it)
{
    std::vector<value> returnedValues;
    point currentItem = *it;
    rtree.query(bgi::satisfies([&](value const& v) {return bg::distance(v.first, currentItem) < 5; }),
        std::back_inserter(returnedValues));
    cout << "Current Item: " << currentItem.get < 0 >() << "," << currentItem.get < 1 >() << "Count: " << returnedValues.size() << endl;
} 
getchar();
  }

R树是最有用的空间索引数据结构之一，但已被证明对特定领域和问题有用。话虽如此，这并不是不说教的理由（毕竟，所问的可能是对实际问题的简化）。

如果您选择使用R-树，那么您正在执行域分解。就像空间填充曲线一样，您可以对手头的空间进行排序，但节点元素主要位于空间附近（离根越远）。

理想的解决方案是以形成radius=5区域的方式构建R-Trees，但这需要自定义数据结构和STR或批量加载算法的自定义，并且类似于聚类算法。

有了boost::index，您可以识别所有邻居，我将尝试详细说明代码：

强制性包括

#include <vector>
#include <iostream>
#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point.hpp>
#include <boost/geometry/geometries/box.hpp>
#include <boost/geometry/index/rtree.hpp>

定义

namespace bg  = boost::geometry;
namespace bgi = boost::geometry::index;   
using  point  = bg::model::point < float, 2, bg::cs::cartesian > ;

辅助设备

Boost R树有一个query方法。尽管它被设计为执行典型的查询，如kNN或overlaping，但您可以向它提供自定义谓词。在这里，我们设计了一个返回true的方法，如果我们查询的点距离base点（两个变量都在构造时指定）达max_dist

struct distance_pred
{
    point const& _base; 
    double       _threshold; 
    distance_pred(point const& base, double max_dist)
        : _base(base)
        , _threshold(max_dist)
    {
    }
    bool operator()(point const& p) const
    {
        auto d = boost::geometry::distance(_base, p); 
        return d && d < _threshold; 
    }
};
// just for output
std::ostream& operator<<(std::ostream &os, point const& p)
{
    os << "{ " << p.get<0>() << ", " << p.get<1>() << " }"; 
    return os; 
}

执行

对于每一个点，我们查询那些位于之外最多distance=5的点

int main()
{
    std::vector<point> cloud {
        point(100, 150), point(101, 152), point(102, 151), 
        point(105, 155), point( 50,  50), point( 51,  55), 
        point( 55,  55), point(150, 250), point(190, 260) 
    }; 
    bgi::rtree<point, bgi::quadratic<16>> rtree(cloud.begin(), cloud.end());
    std::vector<point> hood;
    for (auto &&p : cloud)
    {
        hood.clear(); 
        std::cout << "neighborhood of point " << p << "n-----------------nn";
        rtree.query(bgi::satisfies(distance_pred(p, 5)), std::back_inserter(hood)); 
        // Output the results -----------------------------------------
        if (!hood.empty())
        {
            for (auto &&pt : hood) std::cout << 't' << pt << std::endl;
        }
        else
        {
            std::cout << "t... is emptyn"; 
        }
        std::cout << std::endl; 
    }
}

扩展

如果你想排除一些东西，我相信聚类算法会更合适，这超出了RTrees的范围。例如，如果由于靠近点1而排除的点恰好靠近点2，该怎么办？

然而，如果你真的想这么做，那只是一个记账的问题。定义一个类似的点

using  pointI  = std::pair<point, std::size_t>; // remember : geometric info first

并将循环转换为

for (std::size_t i(0), i < cloud.size(); ++i)
{
    if (cloud.end() != std::find(rec.begin(), rec.end(), i))
    { // you'll only be building neighorhoods for points that are not touched
        // queries and recording performed on point2 types  
    }
}

完整代码演示了这种逻辑中的问题：许多邻域仍然是空的。

与上述相同的代码可以用更少的代码实现，但看起来更复杂（基本上，我们将查询放入lambda函数中，并使用查询迭代器循环处理结果）Demo

R-tree只是一个数据结构，而不是一个算法，对我来说它看起来相当复杂。除非你真的需要处理mirco效率，否则我会使用简单的标准算法和点向量。std::count_if将是我的第一个猜测。