找出后缀数组的两种算法中哪一种更快,为什么?

Which of the two algorithms for finding out the suffix array is faster and why?

本文关键字:哪一种 为什么 算法 后缀 数组 两种      更新时间:2023-10-16

我对算法复杂性不熟悉,因此无法理解以下两种算法的复杂性。两者都找出给定字符串的后缀数组。第一个是我自己创作的,第二个是我在网上找到的。我想知道哪个更快,为什么?

第一次算法

#include<iostream>
#include<string>
using namespace std;
struct suffix{
    string str;
    int pos;
};
int main()
{
    string input;
    suffix arr[100];
    getline(cin,input,'n');
    for(int i=0;i<input.length();i++)
    {
        for(int j=i;j<input.length();j++)
        {
            arr[i].str+=input[j];
        }
            arr[i].pos=i;
        for(int j=0;j<i;j++)
        {
            if(arr[i].str.compare(arr[j].str)<0)    
            {
                string temp=arr[i].str;
                arr[i].str=arr[j].str;
                arr[j].str=temp;
                int tem=arr[i].pos;
                arr[i].pos=arr[j].pos;
                arr[j].pos=tem;
                break;
            }
        }
    }
    for(int i=0;i<input.length();i++)
        cout<<arr[i].pos<<",";
    return 0;
} 

的第二个算法
#include bits/stdc++.h  
using namespace std;
// suffixRank is table hold the rank of each string on each iteration  
// suffixRank[i][j] denotes rank of jth suffix at ith iteration  
int suffixRank[20][int(1E6)];
// Example "abaab"  
// Suffix Array for this (2, 3, 0, 4, 1)  
// Create a tuple to store rank for each suffix  
struct myTuple {  
    int originalIndex;   // stores original index of suffix  
    int firstHalf;       // store rank for first half of suffix  
    int secondHalf;      // store rank for second half of suffix  
};

// function to compare two suffix in O(1)  
// first it checks whether first half chars of 'a' are equal to first half chars of b  
// if they compare second half  
// else compare decide on rank of first half  
int cmp(myTuple a, myTuple b) {  
    if(a.firstHalf == b.firstHalf) return a.secondHalf < b.secondHalf;  
    else return a.firstHalf < b.firstHalf;  
}
int main() {
    // Take input string
    // initialize size of string as N
    string s; cin >> s;
    int N = s.size();
    // Initialize suffix ranking on the basis of only single character
    // for single character ranks will be 'a' = 0, 'b' = 1, 'c' = 2 ... 'z' = 25
    for(int i = 0; i < N; ++i)
        suffixRank[0][i] = s[i] - 'a';
    // Create a tuple array for each suffix
    myTuple L[N];
    // Iterate log(n) times i.e. till when all the suffixes are sorted
    // 'stp' keeps the track of number of iteration
    // 'cnt' store length of suffix which is going to be compared
    // On each iteration we initialize tuple for each suffix array
    // with values computed from previous iteration
    for(int cnt = 1, stp = 1; cnt < N; cnt *= 2, ++stp) {
        for(int i = 0; i < N; ++i) {
            L[i].firstHalf = suffixRank[stp - 1][i];
            L[i].secondHalf = i + cnt < N ? suffixRank[stp - 1][i + cnt] : -1;
            L[i].originalIndex = i;
        }
        // On the basis of tuples obtained sort the tuple array
        sort(L, L + N, cmp);
        // Initialize rank for rank 0 suffix after sorting to its original index
        // in suffixRank array
        suffixRank[stp][L[0].originalIndex] = 0;
        for(int i = 1, currRank = 0; i < N; ++i) {
            // compare ith ranked suffix ( after sorting ) to (i - 1)th ranked suffix
            // if they are equal till now assign same rank to ith as that of (i - 1)th
            // else rank for ith will be currRank ( i.e. rank of (i - 1)th ) plus 1, i.e ( currRank + 1 )
            if(L[i - 1].firstHalf != L[i].firstHalf || L[i - 1].secondHalf != L[i].secondHalf)
                ++currRank;
            suffixRank[stp][L[i].originalIndex] = currRank;
        }
    }
    // Print suffix array
    for(int i = 0; i < N; ++i) cout << L[i].originalIndex << endl;
    return 0;
} 

要确定哪一个在给定的N中运行得更快,您需要同时运行它们并查看。但是,为了确定哪一个可以更好地扩展,您可以简单地查看您的循环。

在你的第一个算法中,你有嵌套循环,从0input.size(),增量为1,即O(N^2)(如果input.size()为1,两个循环运行一次,总共运行一次,如果input.size()为2,外部循环运行两次,内部循环运行两次,每个外部循环运行共4次迭代,依此类推)。

然而,第二个算法有一个外部循环,从0N,并在每次迭代时乘以2。这将增长为log(N)而不是N。因此,它是O(N*log(N)),它比O(N^2)小,并且可能更好地伸缩。