在带有智能指针的大量对象中创建多个索引

Create multiple indexes into a large collection of objects with smart pointers

本文关键字：对象创建索引智能指针更新时间：2023-10-16

我正在将多个索引（即使用不同的密钥）创建到大量对象中。物体可以改变，收集可以缩小和生长。到目前为止我的想法：

将某种指针的多种收集物保留到对象上。使用设置代替地图以更好地封装。使用unordered_set与大型数据集进行良好的扩展。理想情况下，指针应该是智能指针的某种形式。

我可以通过唯一_ptr的主集合来轻松地开始，该集合管理所有分配，以及使用" RAW"指针的次级索引（我暂时遗漏了支持功能，但请注意，该索引是一个MultiSet在整个集合中不会唯一）：

typedef boost::unordered_set< boost::unique_ptr<MyObject>,myobject_hash,myobjects_equal > MyObjects;
typedef boost::unordered_multiset<const MyObject*,myobject_index2_hash,myobject_index2_equal > MyObjectsIndex2;

用法很简单：

MyObjects my_objects;
MyObjectsIndex2 my_objects_index2;
auto it_mo = my_objects.insert(
    boost::unique_ptr<MyObject>(
        new MyObject(...)
    )
);
const MyObject* p_mo = it_mo.first->get();
my_objects_index2.insert(p_mo);

我正在考虑付出额外的努力，以替换索引对原始指针的使用，并用const引用到主集合的唯一_ptrs。我不确定我是否可以，至少不容易。我以为我会问其他人是否已经走了那条路线，还是有其他建议。

update

到目前为止学到的教训：

数据存储类很酷
Reference_wrappers很酷
带有"键"对象数据存储成员var的XX_SET比xx_map更高。但是...您无法轻易将unique_ptr用作C 11中的键。C 14显然可以通过std::set<Key>::find具有更好的设置功能。有关更多详细信息，请参见此处。因此，就目前而言，在这里管理原始分配的数据存储似乎比试图强制使用unique_ptr作为集合键或使用地图增加键空间存储更有意义。
记住要强制钥匙值成为对象寿命（使用构造函数中提供的const值）

这是一种方式。

std::vector<unique_ptr>保存数据项（以确保向量调整大小时地址不会更改），然后容纳参考_wrappers（可复制引用）以制作索引。

可编译的示例：

#include <map>
#include <vector>
#include <set>
#include <string>
#include <functional>
#include <memory>
#include <iostream>
struct Thing {
    Thing(std::string name, int value)
    : _name { std::move(name) }
    , _value { value }
    {}
    const std::string& name() const {
        return _name;
    }
    void write(std::ostream& os) const {
        os << "{ " << _name << " : " << _value << " }";
    }    
private:
    std::string _name;
    int _value;
};
inline std::ostream& operator<<(std::ostream& os, const Thing& t) {
    t.write(os);
    return os;
}
struct multi_index
{
    using multi_by_name_index = std::multimap<std::string, std::reference_wrapper<Thing>>;
    void add_thing(std::string name, int value) {
        // todo: checks to ensure that indexes won't be violated
        // add a new thing to the main store
        _main_store.emplace_back(new Thing{std::move(name), value});
        // store a reference to it in each index
        auto& new_thing = *(_main_store.back().get());
        _name_index.emplace(new_thing.name(), new_thing);
    }
    using multi_by_name_range = std::pair<multi_by_name_index::const_iterator, multi_by_name_index::const_iterator>;
    multi_by_name_range get_all_by_name(const std::string name) const
    {
        return _name_index.equal_range(name);
    }
private:
    std::vector<std::unique_ptr<Thing>> _main_store;
    std::multimap<std::string, std::reference_wrapper<Thing>> _name_index;
};
using namespace std;
int main()
{
    multi_index mi;
    mi.add_thing("bob", 8);
    mi.add_thing("ann", 4);
    mi.add_thing("bob", 6);
    auto range = mi.get_all_by_name("bob");
    for( ; range.first != range.second ; ++range.first) {
        cout << range.first->second << endl;
    }
   return 0;
}

预期输出：

{ bob : 8 }                                                                                                                             
{ bob : 6 }

我认识到您的用例可能与我为我的示例中的案例有所不同，没有更多的细节，我将无法制作匹配的用例（我也认为如果您有很多细节，您将可以自己找到解决方案）。

#include <iostream>
#include <map>
#include <set>
#include <memory>
#include <stdexcept>
using namespace std;
class Thing
{
public:
    Thing() = default;
    Thing(const Thing &other) = default;
    Thing(int i, string p, string d) : id(i), desc(d), part(p) {}
    int    id;
    string desc;
    string part;
};
ostream &operator<<(ostream &out, const Thing &t)
{
    if (&t == NULL) out << "(NULL)"; // don't judge me
    else out << t.id << ": " << t.part << " (" << t.desc << ")";
}
class Datastore
{
public:
    Datastore() = default;
    shared_ptr<const Thing> Add(const Thing &t)
    {
        if (!(index_bydesc.find(t.desc) == index_bydesc.end() &&
              index_bypart.find(t.part) == index_bypart.end() &&
              index_byid.find(t.id) == index_byid.end()))
            throw runtime_error("Non-unique insert");
        shared_ptr<const Thing> newt = make_shared<const Thing>(t);
        weak_ptr<const Thing> weak = weak_ptr<const Thing>(newt);
        index_bydesc[newt->desc] = weak;
        index_bypart[newt->part] = weak;
        index_byid[newt->id] = weak;
        store.insert(newt);
        return newt;
    }
    void Remove(const Thing &t)
    {
        shared_ptr<const Thing> p = FindBy_Desc(t.desc);
        store.erase(p);
        index_bydesc.erase(p->desc);
        index_bypart.erase(p->part);
        index_byid.erase(p->id);
    }
    shared_ptr<const Thing> FindBy_Desc(string desc)
    {
        map<string, weak_ptr<const Thing> >::iterator iter = index_bydesc.find(desc);
        if (iter == index_bydesc.end()) return shared_ptr<const Thing>();
        return iter->second.lock();
    }
    // index accessors for part and quantity omitted
private:
    std::set<shared_ptr<const Thing> > store;
    std::map<string, weak_ptr<const Thing> > index_bydesc;
    std::map<string, weak_ptr<const Thing> > index_bypart;
    std::map<int, weak_ptr<const Thing> > index_byid;
};
int main() {
    Datastore d;
    d.Add(Thing(1, "TRNS-A", "Automatic transmission"));
    d.Add(Thing(2, "SPKPLG", "Spark plugs"));
    d.Add(Thing(3, "HOSE-S", "Small hoses"));
    d.Add(Thing(4, "HOSE-L", "Large hoses"));
    d.Add(Thing(5, "BATT-P", "Primary battery (14.5v nominal)"));
    d.Add(Thing(6, "BATT-S", "Secondary batteries (1.5v nominal)"));
    d.Add(Thing(7, "CRKSFT", "Crank shaft"));
    d.Add(Thing(8, "REAC-F", "Fusion reactor power source"));
    cout << *d.FindBy_Desc("Crank shaft") << endl;
    d.Remove(*d.FindBy_Desc("Crank shaft"));
    cout << *d.FindBy_Desc("Crank shaft") << endl;
    return 0;
}

缺点：

存储结构仅读取。这是一个必要的缺点，因为如果对象在数据存储中修改对象的索引字段，则索引将过时。要修改对象，请将其删除，然后重新添加另一个。
所有字段都必须是唯一的。这很容易改变，但是您需要保留包含list<Thing>作为非唯一字段的索引，而不仅仅是包含Thing的地图。
与使用std::map有关的性能问题。std::unordered_map是巨大数据结构的替代方案（与std::unordered_set相同）。

偏差：

鉴于您在这里有明确的键值关系，我认为您的地图比集更好。
为了解决与参考计数有关的性能问题，如果您始终谨慎地保持内部一致性，则可以放弃所有智能指针的原始指针，并通过参考返回值，并且可以通过填充它时使用不安全的对象所有权语义（即，将其指针传递给数据存储的对象，然后将其所有权占有）。更复杂，但最终更少的副本和开销的运行时间更少。