Viterbi algorithm with OpenMP
Viterbi algorithm with OpenMP
我正在尝试在OpenMP的帮助下实现Viterbi算法。到目前为止,我的测试表明并行程序的执行时间大约是顺序程序执行时间的4倍。下面是我的代码:
#include <omp.h>
#include <stdio.h>
#include <time.h>
#define K 39 // num states
#define T 1500 // num obs sequence
int states[K];
double transition[K][K];
double emission[K][K];
double init_prob[K];
int observation[T];
using namespace std;
void generateValues()
{
srand(time(NULL));
for(int i=0; i<T; i++)
{
observation[i] = rand() % K;
}
for(int i=0; i<K; i++)
{
states[i] = i;
init_prob[i] = (double)rand() / (double)RAND_MAX;
for(int j=0;j<K;j++)
{
transition[i][j] = (double)rand() / (double)RAND_MAX;
srand(time(NULL));
emission[i][j] = (double)rand() / (double)RAND_MAX;
}
}
}
int* viterbi(int *S, double *initp, int *Y, double A[][K], double B[][K])
{
double T1[K][T];
int T2[K][T];
#pragma omp parallel for
for(int i=0; i<K; ++i)
{
T1[i][0] = initp[i];
T2[i][0] = 0;
}
for(int i=1; i<T; ++i)
{
double max, temp;
int argmax;
#pragma omp parallel for private (max, temp, argmax)
for(int j=0; j<K; ++j)
{
max = -1;
#pragma omp parallel for
for(int k=0; k<K; ++k)
{
temp = T1[k][i-1] * A[k][j] * B[k][Y[i-1]];
if(temp > max)
{
max = temp;
argmax = k;
}
}
T1[j][i] = max;
T2[j][i] = argmax;
}
}
int Z[T];
int X[T];
double max = -1, temp;
#pragma omp parallel for
for(int k=0; k<K; ++k)
{
temp = T1[k][T-1];
if(temp > max)
{
max = temp;
Z[T-1] = k;
}
}
X[T-1] = S[Z[T-1]];
for(int i=T-1; i>0; --i)
{
Z[i-1] = T2[Z[i]][i];
X[i-1] = S[Z[i-1]];
}
return X;
}
int* viterbiNoOmp(int *S, double *initp, int *Y, double A[][K], double B[][K]) // the same as before, minus the #pragma omp
int main()
{
clock_t tStart;
int *path;
generateValues();
double sumOmp = 0;
for(int i=0;i<6;i++)
{
double start = omp_get_wtime();
path = viterbi(states, init_prob, observation, transition, emission);
double end = omp_get_wtime();
sumOmp += end - start;
}
double sumNoOmp = 0;
for(int i=0;i<6;i++)
{
tStart = clock();
path = viterbiNoOmp(states, init_prob, observation, transition, emission);
sumNoOmp += ((double)(clock() - tStart)/CLOCKS_PER_SEC);
}
for (int i=0;i<T;i++)
{
printf("%d, ", path[i]);
}
printf("nntime With Omp: %fntime without Omp: %f", sumOmp/6, sumNoOmp/6);
return 0;
}
我做错了什么?
首先,您第一次测量时使用的是omp_get_wtime()
函数,第二次测量时使用的是clock()
。
两者都使用omp_get_wtime()
,你会看到一点改善
其次,不使用sumNoOmp += ((double)(clock() - tStart)/CLOCKS_PER_SEC);
使用sumNoOmp = ((double)(clock() - tStart)/CLOCKS_PER_SEC);
现在让我们继续你的代码:尝试并行嵌套循环有点棘手尝试只对外部循环使用#pragma omp parallel for
,并观察结果
相关文章:
- OpenMP阵列性能较差
- OpenMP卸载说'fatal error: could not find accel/nvptx-none/mkoffload'
- 使用 GCC 卸载的 OpenMP 卸载失败,并出现"Ptx assembly aborted due to errors"
- Problems with std::cin.fail()
- OpenMP:并行更新数组总是需要减少数组吗
- 如何使用OpenMP并行这两个循环
- 从python调用openMP共享库时,未定义opnMP函数
- Qimage setPixel with openmp 并行 for 不起作用
- 在Visual Studio中使用OpenMP with Clang和CMake
- "Segfault using proj4 with OpenMP"或"How to use thread-specific globals with OpenMP"
- OpenMP 4.5 on Windows with Clang, CMake & Ninja
- paralelizing for loop with inequality (openmp c++)
- Openmp with ofstream and system command
- OpenMP with clang
- TBB concurrent_vector with openmp
- Makefile with OpenMP:不能用-c、-S或-E指定多个文件的-o
- pointers with OpenMP
- Parallel programming in c++ with openmp
- Using openmp with odeint
- Viterbi algorithm with OpenMP