使用C++中的线程来报告计算进度

Using a thread in C++ to report progress of computations

本文关键字：报告计算线程 C++ 使用更新时间：2023-10-16

我正在编写一个通用抽象类，以便能够根据需要报告任意多个实例变量的状态。例如，考虑以下无用的循环：

int a, b;
for (int i=0; i < 10000; ++i) {
    for (int j=0; j < 1000; ++j) {
        for (int k =0; k < 1000; ++k) {
            a = i;
            b = j;
        }
    }
}

如果能够看到a和b的值而不必修改循环，那就太好了。在过去，我曾写过这样的声明：

int a, b;
for (int i=0; i < 10000; ++i) {
    for (int j=0; j < 1000; ++j) {
        for (int k =0; k < 1000; ++k) {
            a = i;
            b = j;
            if (a % 100 == 0) {
                printf("a = %dn", a);
            }
        }
    }
}

这将允许我每100次迭代看到a的值。然而，根据正在进行的计算，有时不可能以这种方式检查进度。这个想法是能够离开电脑，在给定的时间后回来，检查你想看到的任何值。

为此，我们可以使用pthreads。以下代码有效，我发布它的唯一原因是我不确定我是否正确使用了线程，主要是如何关闭它

首先让我们考虑文件"reporter.h"：

#include <cstdio>
#include <cstdlib>
#include <pthread.h>
void* run_reporter(void*);
class reporter {
public: 
    pthread_t thread;
    bool stdstream;
    FILE* fp;
    struct timespec sleepTime;
    struct timespec remainingSleepTime;
    const char* filename;
    const int sleepT;
    double totalTime;
    reporter(int st, FILE* fp_): fp(fp_), filename(NULL), stdstream(true), sleepT(st) {
        begin_report();
    }
    reporter(int st, const char* fn): fp(NULL), filename(fn), stdstream(false), sleepT(st) {
        begin_report();
    }
    void begin_report() {
        totalTime = 0;
        if (!stdstream) fp = fopen(filename, "w");
        fprintf(fp, "reporting every %d seconds ...n", sleepT);
        if (!stdstream) fclose(fp);
        pthread_create(&thread, NULL, run_reporter, this);
    }
    void sleep() {
        sleepTime.tv_sec=sleepT;
        sleepTime.tv_nsec=0;
        nanosleep(&sleepTime, &remainingSleepTime);
        totalTime += sleepT;
    }
    virtual void report() = 0;
    void end_report() {
        pthread_cancel(thread);
        // Wrong addition of remaining time, needs to be fixed
        // but non-important at the moment.
        //totalTime += sleepT - remainingSleepTime.tv_sec;
        long sec = remainingSleepTime.tv_sec;
        if (!stdstream) fp = fopen(filename, "a");
        fprintf(fp, "reported for %g seconds.n", totalTime);
        if (!stdstream) fclose(fp);
    }
};
void* run_reporter(void* rep_){
    reporter* rep = (reporter*)rep_;
    while(1) {
        if (!rep->stdstream) rep->fp = fopen(rep->filename, "a");
        rep->report();
        if (!rep->stdstream) fclose(rep->fp);
        rep->sleep();
    }
}

这个文件声明了抽象类reporter，注意纯虚拟函数report。这是将打印消息的功能。每个报告程序都有自己的thread，并且在调用reporter构造函数时创建线程。要在我们无用的循环中使用reporter对象，现在我们可以执行以下操作：

#include "reporter.h"
int main() {
    // Declaration of objects we want to track
    int a = 0;
    int b = 0;
    // Declaration of reporter
    class prog_reporter: public reporter {
    public:
        int& a;
        int& b;
        prog_reporter(int& a_, int& b_):
            a(a_), b(b_),
            reporter(3, stdout)
        {}
        void report() {
            fprintf(fp, "(a, b) = (%d, %d)n", this->a, this->b);
        }
    };
    // Start tracking a and b every 3 seconds
    prog_reporter rep(a, b);
    // Do some useless computation
    for (int i=0; i < 10000; ++i) {
        for (int j=0; j < 1000; ++j) {
            for (int k =0; k < 1000; ++k) {
                a = i;
                b = j;
            }
        }
    }
    // Stop reporting
    rep.end_report();
}

在编译这个代码（没有优化标志）并运行它之后，我获得了：

macbook-pro:Desktop jmlopez$ g++ testing.cpp
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 60)
(a, b) = (1497, 713)
(a, b) = (2996, 309)
(a, b) = (4497, 478)
(a, b) = (5996, 703)
(a, b) = (7420, 978)
(a, b) = (8915, 78)
reported for 18 seconds.

这正是我想要它做的，使用优化标志，然后我得到：

macbook-pro:Desktop jmlopez$ g++ testing.cpp -O3
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 0)
reported for 0 seconds.

这并不奇怪，因为编译器可能会重写我的代码，以便在更短的时间内给出相同的答案。我最初的问题是，如果我延长循环时间，为什么记者没有给我变量的值，例如：

for (int i=0; i < 1000000; ++i) {
    for (int j=0; j < 100000; ++j) {
        for (int k =0; k < 100000; ++k) {
            a = i;
            b = j;
        }
    }
}

使用优化标志再次运行代码后：

macbook-pro:Desktop jmlopez$ g++ testing.cpp -O3
macbook-pro:Desktop jmlopez$ ./a.out 
reporting every 3 seconds ...
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
(a, b) = (0, 0)
reported for 39 seconds.

问题：这是因为优化标志修改了代码，它只是决定直到最后才更新变量吗？

主要问题：

在reporter方法end_report中，我调用函数pthread_cancel。在阅读了以下答案后，我对函数的使用以及如何终止报告线程产生了怀疑。对于那些有pthreads经验的人来说，像我所做的那样使用thread是否有任何明显的漏洞或潜在的问题？

C++不了解线程，您的代码使用两个局部变量a和b，并且不调用具有未知代码的函数。

发生的情况是，a和b在循环的计算过程中最终进入寄存器，并且它们仅在循环结束时更新。

虽然a和b必须获得一个真实的内存地址（因为它们是作为对外部函数的引用传递的），但编译器不知道某些知道a和b地址的外部代码会在循环期间执行，因此更喜欢将所有中间值存储到寄存器中，直到循环结束。

但是，如果循环中的代码调用了未知函数（即实现未知的函数），则编译器将被迫在调用该函数之前更新a和b，因为它一定是偏执的，并考虑到传递了a和b地址的进度函数可能会将此信息传递给未知函数。

关于主要问题：你很接近。向pthread_join()添加呼叫(http://linux.die.net/man/3/pthread_join)pthread_cancel()之后，一切都应该会好起来。

join调用确保清理线程资源，如果忘记，在某些情况下可能会导致线程资源耗尽。

需要补充的是，使用pthread_cancel()时（除了记住加入线程之外），重要的一点是确保要取消的线程有一个所谓的取消点，您的线程通过调用nanosleep()（可能还有fopen、fprintf和fclose，可能是取消点）来做到这一点。如果不存在取消点，您的线程将继续运行。