OpenMP 函数并行调用

OpenMP function calls in parallel

本文关键字:调用 并行 函数 OpenMP      更新时间:2023-10-16

我正在寻找一种并行调用函数的方法。

例如,如果我有 4 个线程,我希望每个线程都使用自己的线程 id 作为参数调用相同的函数。

由于参数,没有线程将处理相同的数据。

#pragma omp parallel
{
    for(int p = 0; p < numberOfThreads; p++)
    {
        if(p == omp_get_thread_num())
            parDF(p);
    }
}

线程 0 应运行 parDF(0)

线程 1 应该运行 parDF(1)

线程 2 应该运行 parDF(2)

线程 3 应该运行 parDF(3)

所有这些都应该同时完成...

这(显然)不起作用,但是执行并行函数调用的正确方法是什么?

编辑:实际代码(这可能是太多的信息...但它是被要求的...

从调用 parDF() 的函数:

omp_set_num_threads(NUM_THREADS);
#pragma omp parallel
{
    numberOfThreads = omp_get_num_threads();
    //split nodeQueue
    #pragma omp master
    {
        splitNodeQueue(numberOfThreads);
    }
    int tid = omp_get_thread_num();
    //printf("Hello World from thread = %dn", tid);
    #pragma omp parallel for private(tid)
    for(int i = 0; i < numberOfThreads; ++i)
    {
            parDF(tid, originalQueueSize, DFlevel);
    }
}

parDF 函数:

bool Tree::parDF(int id, int originalQueueSize, int DFlevel)
{
double possibilities[20];
double sequence[3];
double workingSequence[3];
int nodesToExpand = originalQueueSize/omp_get_num_threads();
int tenthsTicks = nodesToExpand/10;
int numPossibilities = 0;
int percentage = 0;
list<double>::iterator i;
list<TreeNode*>::iterator n;
cout << "My ID is: "<< omp_get_thread_num() << endl;
        while(parNodeQueue[id].size() > 0 and parNodeQueue[id].back()->depth == DFlevel)
        {
            if(parNodeQueue[id].size()%tenthsTicks == 0)
            {
                cout << endl;
                cout << percentage*10 << "% done..." << endl;
                if(percentage == 10)
                {
                    percentage = 0;
                }
                percentage++;
            }
            //countStartPoints++;
            depthFirstQueue.push_back(parNodeQueue[id].back());
            numPossibilities = 0;
            for(i = parNodeQueue[id].back()->content.sortedPoints.begin(); i != parNodeQueue[id].back()->content.sortedPoints.end(); i++)
            {
                for(int j = 0; j < deltas; j++)
                {
                    if(parNodeQueue[id].back()->content.doesPointExist((*i) + delta[j]))
                    {
                        for(int k = 0; k <= numPossibilities; k++)
                        {
                            if(fabs((*i) + delta[j] - possibilities[k]) < 0.01)
                            {
                                goto pointAlreadyAdded;
                            }
                        }
                        possibilities[numPossibilities] = ((*i) + delta[j]);
                        numPossibilities++;
                        pointAlreadyAdded:
                        continue;
                    }
                }
            }
            // Out of the list of possible points. All combinations of 3 are added, building small subtrees in from the node.
            // If a subtree succesfully breaks the lower bound, true is returned.
            for(int i = 0; i < numPossibilities; i++)
            {
                for(int j = 0; j < numPossibilities; j++)
                {
                    for(int k = 0; k < numPossibilities; k++)
                    {
                        if( k != j and j != i and i != k)
                        {
                            sequence[0] = possibilities[i];
                            sequence[1] = possibilities[j];
                            sequence[2] = possibilities[k];
                            //countSeq++;
                            if(addSequence(sequence, id))
                            {
                                //successes++;
                                workingSequence[0] = sequence[0];
                                workingSequence[1] = sequence[1];
                                workingSequence[2] = sequence[2];
                                parNodeQueue[id].back()->workingSequence[0] = sequence[0];
                                parNodeQueue[id].back()->workingSequence[1] = sequence[1];
                                parNodeQueue[id].back()->workingSequence[2] = sequence[2];
                                parNodeQueue[id].back()->live = false;
                                succesfulNodes.push_back(parNodeQueue[id].back());
                                goto nextNode;
                            }
                            else
                            {
                                destroySubtree(parNodeQueue[id].back());
                            }
                        }
                    }
                }
            }
            nextNode:
            parNodeQueue[id].pop_back();
        }

这就是你所追求的吗?

住在科里鲁

#include <omp.h>
#include <cstdio>
int main()
{
    int nthreads, tid;
#pragma omp parallel private(tid)
    {
        tid = ::omp_get_thread_num();
        printf("Hello World from thread = %dn", tid);
        /* Only master thread does this */
        if (tid == 0) {
            nthreads = ::omp_get_num_threads();
            printf("Number of threads = %dn", nthreads);
        }
    } /* All threads join master thread and terminate */
}

输出:

Hello World from thread = 0
Number of threads = 8
Hello World from thread = 4
Hello World from thread = 3
Hello World from thread = 5
Hello World from thread = 2
Hello World from thread = 1
Hello World from thread = 6
Hello World from thread = 7

你应该做这样的事情:

#pragma omp parallel private(tid)
{ 
    tid = omp_get_thread_num();
    parDF(tid);
}

我认为这很简单。

有两种方法可以实现您想要的:

  1. 正如您描述它的方式一样:每个线程都使用自己的线程 ID 启动函数:

    #pragma omp parallel
    {
        int threadId = omp_get_thread_num();
        parDF(threadId);
    }
    

    并行块启动的系统报告它支持的线程数,每个线程执行该块。由于它们的 threadId 不同,因此它们将处理不同的数据。要强制启动更多线程,您可以在杂注中添加numthreads(100)或其他内容。

  2. 执行所需操作的正确方法是使用并行 for 块。

    #pragma omp parallel for
    for (int i=0; i < numThreads; ++i) {
        parDF(i);
    }
    

    这样,循环的每次迭代(值为 i)都会分配给执行它的线程。只要有可用的线程,就会并行运行尽可能多的迭代。

方法 1. 不是很通用,并且效率低下,因为您必须拥有尽可能多的线程,因为您需要函数调用。方法 2.是解决问题的规范(正确)方法。