以一定的速度使用Map vs Vector

Using Map vs Vector in a speed

本文关键字:Map vs Vector 速度      更新时间:2023-10-16

嗨,我已经编写了两个版本的类,一个使用映射,另一个使用两个向量:

    class NucleotideSequence{
private:
    std::string Name;
    std::vector<int> BasePos;
    std::vector<char> BaseChar;
public:
    NucleotideSequence(std::string name, std::vector<int> &bp, std::vector<char> &bases);
    std::string getName();
    char getBase(int pos); // get a base by it's position in the char array.
    char getAbBase(int abPos); // get a base by it's actual bp position.
};

class NucleotideSequence2{
private:
    std::string Name;
    std::map<int, char> Sequence;
public:
    NucleotideSequence2(std::string &name, std::map<int, char> &seq) throw(FormatError);
    std::string getName();
};

然后我为它们定义了构造函数:

NucleotideSequence::NucleotideSequence(std::string name, std::vector<int> &bp, std::vector<char> &bases)
:Name(name), BasePos(bp), BaseChar(bases)
{
    for (std::vector<char>::iterator i = BaseChar.begin(); i != BaseChar.end(); i++) {
        switch (*i) {
            case 'A': case 'T': case 'C': case 'G': case '-': case 'N':
                break;
            case 'a':
                *i = 'A';
                break;
            case 't':
                *i = 'T';
                break;
            case 'c':
                *i = 'C';
                break;
            case 'g':
                *i = 'G';
                break;
            case 'n':
                *i = 'N';
                break;
            default:
                throw FormatError();
                break;
        }
    }
}
NucleotideSequence2::NucleotideSequence2(std::string &name, std::map<int, char> &seq) throw(FormatError)
: Name(name), Sequence(seq)
{
    for (std::map<int, char>::iterator i = Sequence.begin(); i != Sequence.end(); i++) {
        switch (i->second) {
            case 'A': case 'T': case 'C': case 'G': case '-': case 'N':
                break;
            case 'a':
                i->second = 'A';
                break;
            case 't':
                i->second = 'T';
                break;
            case 'c':
                i->second = 'C';
                break;
            case 'g':
                i->second = 'G';
                break;
            case 'n':
                i->second = 'N';
                break;
            default:
                throw FormatError();
                break;
        }
    }
}

这两个构造函数在两个不同的函数中被调用:

NucleotideSequence Sequence_stream::get()
{
    if (FileStream.is_open() == false)
        throw StreamClosed(); // Make sure the stream is indeed open else throw an exception.
    if (FileStream.eof())
        throw FileEnd();
    char currentchar;
    int basepos = 0;
    std::string name;
    std::vector<char> sequence;
    std::vector<int> postn;
    currentchar = FileStream.get();
    if (FileStream.eof())
        throw FileEnd();
    if (currentchar != '>')
        throw FormatError();
    currentchar = FileStream.get();
    while(currentchar != 'n' && false == FileStream.eof())
    {
        name.append(1, currentchar);
        currentchar = FileStream.get();
    } // done getting names, now let's get the sequence.
    currentchar = FileStream.get();
    while(currentchar != '>' && false == FileStream.eof())
    {
        if(currentchar != 'n' && currentchar != ' '){
            basepos++;
            sequence.push_back(currentchar);
            postn.push_back(basepos);
        }
        currentchar = FileStream.get();
    }
    if(currentchar == '>')
    {
        FileStream.unget();
    }
    return NucleotideSequence(name, postn, sequence);
}

NucleotideSequence2 Sequence_stream::get2()
{
    if (FileStream.is_open() == false)
        throw StreamClosed(); // Make sure the stream is indeed open else throw an exception.
    if (FileStream.eof())
        throw FileEnd();
    char currentchar;
    int basepos = 0;
    std::string name;
    std::map<int, char> sequence;
    currentchar = FileStream.get();
    if (FileStream.eof())
        throw FileEnd();
    if (currentchar != '>')
        throw FormatError();
    currentchar = FileStream.get();
    while(currentchar != 'n' && false == FileStream.eof())
    {
        name.append(1, currentchar);
        currentchar = FileStream.get();
    } // done getting names, now let's get the sequence.
    currentchar = FileStream.get();
    while(currentchar != '>' && false == FileStream.eof())
    {
        if(currentchar != 'n' && currentchar != ' '){
            basepos++;
            sequence[basepos] = currentchar;
        }
        currentchar = FileStream.get();
    }
    if(currentchar == '>')
    {
        FileStream.unget();
    }
    return NucleotideSequence2(name, sequence);
}

然后,可以从另一个函数调用这两个函数(它捕获异常:以防您想知道未捕获的抛出)。

这两个类的区别在于,一个类包含两个向量,而在另一个类中,相同的信息包含在一个映射中。

我的问题是:第一个类和构建它的"get"工作得非常快——几乎是即时的。而构建第二个类(带有映射的类)的"get2"则明显较慢,仅超过5秒。

为什么用映射构造类比用两个向量构造类慢——你应该看到,除了在向量中添加元素或在映射中添加键值对之外,我一直保持构造函数和两个get函数几乎相同。因此,我怀疑重复推回向量比重复添加键值对(即mymap['newkey'] = 'newvalue';)更快、更有效。

如何加快地图版本?

谢谢,本。

矢量执行一个单独的分配(如果您提前告诉它所需的容量),或者最多执行一个少量的分配。映射为每个元素执行单独的动态分配

你可能喜欢使用成对的排序向量,或者"平面图"(在Boost中),或者btree图(在Google Code中有一个)进行实验,并比较性能。内存局部性可能会产生巨大的差异,如果您不需要std::map的强大迭代器有效性保证,那么您很可能会找到性能更好的数据结构。

如何加快地图版本

请尝试使用无序映射而不是常规映射。