在更改lpthreads中的处理器数量时出现分段故障(核心转储)

Segmentation fault, (core dumped) when changing number of processors in lpthreads

本文关键字:分段 故障 转储 核心 lpthreads 处理器      更新时间:2023-10-16

当我在8个处理器上运行代码时,遇到了分段错误,但它在1个和4个处理器上都能正常工作。

我使用的是lpthread库,这是我在每个线程中执行的函数。如果需要更多的代码,我可以添加更多。

    void *compute_gauss(void *threadid){
  int local_row, local_norm, col;
  float multiplier;
  long tid;
  tid = (long)threadid;
  fprintf(stdout, "Thread %ld has startedn", tid);
  while (global_norm < N){
    while (global_row < N) {
      pthread_mutex_lock(&global_row_lock);
      local_row = global_row;
      global_row++;
      pthread_mutex_unlock(&global_row_lock);
      print_inputs();
      multiplier = A[local_row][global_norm] / A[global_norm][global_norm];
      for (col = global_norm; col < N; col++) {
        A[local_row][col] -= A[global_norm][col] * multiplier;
      }
      B[local_row] -= B[global_norm] * multiplier;
    }
    pthread_barrier_wait(&barrier);
    if (tid == 0){
      global_norm++;
      global_row=global_norm+1;
    }
    pthread_barrier_wait(&barrier); // wait until all threads arrive
  }
}

这是我初始化屏障的调用函数:

void gauss() {
    int norm, row, col;  /* Normalization row, and zeroing
                          * element row and col */
    int i = 0;
    float multiplier;
    pthread_t threads[procs]; //declared array of threads equal in size to # processors
    global_norm = 0;
    global_row = global_norm+1;
    printf("Computing Parallelized Algorithm.n");
    pthread_barrier_init(&barrier, NULL, procs);
    /* Gaussian elimination */
    for (i = 0; i < procs; i++){
      pthread_create(&threads[i], NULL, &compute_gauss, (void *)i);
    }
    printf("finished creating threadsn");
    for (i = 0; i < procs; i++){
      pthread_join( threads[i], NULL);
    }
    printf("finished joining threadsn");
    /* (Diagonal elements are not normalized to 1.  This is treated in back
   *    * substitution.)
   *       */
     fprintf(stdout, "pre back substition");
    /* Back substitution */
    for (row = N - 1; row >= 0; row--) {
      X[row] = B[row];
      for (col = N-1; col > row; col--) {
        X[row] -= A[row][col] * X[col];
      }
      X[row] /= A[row][row];
    }
    fprintf(stdout, "post back substitution");
  }

这里有一个代码如何侵入数组的例子,如果我错了,请指出:

// suppose global_row = N - 1;
while (global_row < N) {
    pthread_mutex_lock(&global_row_lock);   // thread 2 waits here, global_row is N - 1;
    local_row = global_row;                 // thread 1 is here, global_row is N - 1;
    global_row++;
    pthread_mutex_unlock(&global_row_lock);
    // when thread 2 goes here, local_row is going to be N, out of array boundary.
    multiplier = A[local_row][global_norm] / A[global_norm][global_norm];

您没有包含足够的代码,因此我无法测试您的程序。但是,我敢肯定,问题是没有一个互斥体来保护global_normglobal_rowprint_inputs()。您需要使用互斥锁来保护它们,或者需要使用原子增量运算符。您没有在调试器下看到崩溃,因为它正在更改您的时间。

难道不应该检查pthread_barrier_wait的返回值并检查PTHRAD_BARRIAR_SERIAL_THREAD吗?

也不清楚为什么要调用pthread_barrier_wait两次。