如何根据C++程序中生成的多个图像对视频进行编码，而不将单独的帧图像写入磁盘

How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?

本文关键字：图像编码磁盘单独程序 C++ 何根视频更新时间：2023-10-16

我正在编写一个C++代码，其中在执行其中实现的一些操作后生成N个不同帧的序列。每一帧完成后，我将其作为IMG_%d.png写入磁盘，最后我使用x264编解码器通过ffmpeg将其编码为视频。

程序主要部分的摘要伪代码如下：

std::vector<int> B(width*height*3);
for (i=0; i<N; i++)
{
  // void generateframe(std::vector<int> &, int)
  generateframe(B, i); // Returns different images for different i values.
  sprintf(s, "IMG_%d.png", i+1);
  WriteToDisk(B, s); // void WriteToDisk(std::vector<int>, char[])
}

这种实现的问题是，所需的帧数N通常很高（N~100000），图片的分辨率也很高（1920x1080），导致磁盘过载，每次执行后产生数十GB的写入周期。

为了避免这种情况，我一直试图找到关于将存储在向量B中的每个图像直接解析到诸如x264之类的编码器的文档（而不必将中间图像文件写入磁盘）。尽管发现了一些有趣的主题，但没有一个主题具体解决了我想要的问题，因为其中许多主题涉及使用磁盘上现有的图像文件执行编码器，而其他主题则为Python等其他编程语言提供了解决方案（在这里，您可以为该平台找到一个完全令人满意的解决方案）。

我想获得的伪代码与此类似：

std::vector<int> B(width*height*3);
video_file=open_video("Generated_Video.mp4", ...[encoder options]...);
for (i=0; i<N; i++)
{
  generateframe(B, i+1);
  add_frame(video_file, B);
}
video_file.close();

根据我读到的相关主题，x264 C++API可能能够做到这一点，但如上所述，我没有找到一个令人满意的答案来回答我的具体问题。我试着直接学习和使用ffmpeg源代码，但它的易用性和编译问题迫使我放弃了这种可能性，因为我只是一个非专业的程序员（我把它当作一种爱好，不幸的是，我不能浪费那么多时间学习要求这么高的东西）。

我想到的另一个可能的解决方案是找到一种方法来调用C++代码中的ffmpeg二进制文件，并以某种方式设法将每次迭代的图像数据（存储在B中）传输到编码器，让每一帧的添加（即不"关闭"要写的视频文件）直到最后一帧，这样就可以添加更多的帧，直到到达第N帧，视频文件将被"关闭"。换句话说，通过C++程序调用ffmpeg.exe将第一帧写入视频，但让编码器"等待"更多帧。然后再次调用ffmpeg来添加第二帧，并使编码器再次"等待"更多帧，以此类推，直到到达最后一帧，视频将在那里完成。然而，我不知道如何进行，也不知道这是否真的可行。

编辑1:

正如回复中所建议的，我一直在记录有关命名管道的信息，并试图在代码中使用它们。首先，应该注意的是，我正在使用Cygwin，所以我的命名管道是在Linux下创建的。我使用的修改后的伪代码（包括相应的系统库）如下：

FILE *fd;
mkfifo("myfifo", 0666);
for (i=0; i<N; i++)
{
  fd=fopen("myfifo", "wb");
  generateframe(B, i+1);
  WriteToPipe(B, fd); // void WriteToPipe(std::vector<int>, FILE *&fd)
  fflush(fd);
  fd=fclose("myfifo");
}
unlink("myfifo");

WriteToPipe是对上一个WriteToFile函数的轻微修改，在该函数中，我确保发送图像数据的写缓冲区足够小，可以满足管道缓冲的限制。

然后，我在Cygwin终端中编译并编写以下命令：

./myprogram | ffmpeg -i pipe:myfifo -c:v libx264 -preset slow -crf 20 Video.mp4

然而，当"fopen"行（即第一个fopen调用）i=0时，它仍然停留在循环中。如果我没有调用ffmpeg，这将是很自然的，因为服务器（我的程序）将等待客户端程序连接到管道的"另一端"，但事实并非如此。看起来它们无法通过管道连接，但我无法找到进一步的文档来解决这个问题。有什么建议吗？

经过一番激烈的斗争，我终于学会了如何将FFmpeg和libx264 C API用于我的特定目的，这要归功于一些用户在本网站和其他一些用户中提供的有用信息，以及一些FFmpeg的文档示例。为了便于说明，下面将介绍细节。

首先，编译了libx264 C库，然后编译了带有配置选项--enable-gpl --enable-libx264的FFmpeg库。现在让我们开始编码。实现请求目的的代码的相关部分如下：

包括：

#include <stdint.h>
extern "C"{
#include <x264.h>
#include <libswscale/swscale.h>
#include <libavcodec/avcodec.h>
#include <libavutil/mathematics.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
}

Makefile上的LDFLAGS：

-lx264 -lswscale -lavutil -lavformat -lavcodec

内部代码（为了简单起见，将省略错误检查，并在需要时进行变量声明，而不是为了更好地理解而开始）：

av_register_all(); // Loads the whole database of available codecs and formats.
struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
// Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
char *fmtext="mp4";
char *filename;
sprintf(filename, "GeneratedVideo.%s", fmtext);
AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
AVFormatContext *oc = NULL;
avformat_alloc_output_context2(&oc, NULL, NULL, filename);
AVStream * stream = avformat_new_stream(oc, 0);
AVCodec *codec=NULL;
AVCodecContext *c= NULL;
int ret;
codec = avcodec_find_encoder_by_name("libx264");
// Setting up the codec:
av_dict_set( &opt, "preset", "slow", 0 );
av_dict_set( &opt, "crf", "20", 0 );
avcodec_get_context_defaults3(stream->codec, codec);
c=avcodec_alloc_context3(codec);
c->width = width;
c->height = height;
c->pix_fmt = AV_PIX_FMT_YUV420P;
// Setting up the format, its stream(s), linking with the codec(s) and write the header:
if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
    c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avcodec_open2( c, codec, &opt );
av_dict_free(&opt);
stream->time_base=(AVRational){1, 25};
stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
av_dump_format(oc, 0, filename, 1);
avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
ret=avformat_write_header(oc, &opt);
av_dict_free(&opt); 
// Preparing the containers of the frame data:
AVFrame *rgbpic, *yuvpic;
// Allocating memory for each RGB frame, which will be lately converted to YUV:
rgbpic=av_frame_alloc();
rgbpic->format=AV_PIX_FMT_RGB24;
rgbpic->width=width;
rgbpic->height=height;
ret=av_frame_get_buffer(rgbpic, 1);
// Allocating memory for each conversion output YUV frame:
yuvpic=av_frame_alloc();
yuvpic->format=AV_PIX_FMT_YUV420P;
yuvpic->width=width;
yuvpic->height=height;
ret=av_frame_get_buffer(yuvpic, 1);
// After the format, code and general frame data is set, we write the video in the frame generation loop:
// std::vector<uint8_t> B(width*height*3);

上面评论的向量与我在问题中暴露的向量具有相同的结构；RGB数据以特定方式存储在AVFrames上。因此，为了说明起见，让我们假设我们有一个指向形式为uint8_t[3]矩阵（int，int）的结构的指针，其访问给定坐标（x，y）的像素颜色值的方式是矩阵（x，y）->红色，矩阵（x，y）->绿色和矩阵（x，y）->蓝色，以便分别获得坐标（x，y）的红色、绿色和蓝色值。第一个参数代表水平位置，随着x的增加从左到右，第二个参数代表垂直位置，随着y的增加从上到下。

也就是说，for循环传输数据、编码和写入每个帧将是以下循环：

Matrix B(width, height);
int got_output;
AVPacket pkt;
for (i=0; i<N; i++)
{
    generateframe(B, i); // This one is the function that generates a different frame for each i.
    // The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
    for (y=0; y<height; y++)
    {
        for (x=0; x<width; x++)
        {
            // rgbpic->linesize[0] is equal to width.
            rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
            rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
            rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
        }
    }
    sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
    av_init_packet(&pkt);
    pkt.data = NULL;
    pkt.size = 0;
    yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
    ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
    if (got_output)
    {
        fflush(stdout);
        av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
        pkt.stream_index = stream->index;
        printf("Write frame %6d (size=%6d)n", i, pkt.size);
        av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
        av_packet_unref(&pkt);
    }
}
// Writing the delayed frames:
for (got_output = 1; got_output; i++) {
    ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
    if (got_output) {
        fflush(stdout);
        av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
        pkt.stream_index = stream->index;
        printf("Write frame %6d (size=%6d)n", i, pkt.size);
        av_interleaved_write_frame(oc, &pkt);
        av_packet_unref(&pkt);
    }
}
av_write_trailer(oc); // Writing the end of the file.
if (!(fmt->flags & AVFMT_NOFILE))
    avio_closep(oc->pb); // Closing the file.
avcodec_close(stream->codec);
// Freeing all the allocated memory:
sws_freeContext(convertCtx);
av_frame_free(&rgbpic);
av_frame_free(&yuvpic);
avformat_free_context(oc);

旁注：

为了将来参考，由于网络上关于时间戳（PTS/DTS）的可用信息看起来非常混乱，我接下来将解释我是如何通过设置适当的值来解决这些问题的。错误地设置这些值会导致输出大小远大于通过ffmpeg构建的二进制命令行工具获得的大小，因为帧数据是通过比FPS实际设置的时间间隔更小的时间间隔冗余写入的。

首先，应该注意的是，在编码时有两种时间戳：一种与帧（PTS）相关（预编码阶段），另一种与分组相关（PTS和DTS）（后编码阶段）。在第一种情况下，帧PTS值似乎可以使用自定义的参考单位来分配（唯一的限制是，如果想要恒定的FPS，它们必须等距），因此可以像我们在上面的代码中所做的那样，以帧号为例。在第二个例子中，我们必须考虑以下参数：

输出格式容器的时基，在我们的情况下是mp4（=1280 Hz），其信息保存在流中->time_base
视频所需的FPS
如果编码器是否生成B帧（在第二种情况下，帧的PTS和DTS值必须设置为相同，但如果是第一种情况，则会更复杂，如本例所示）。有关更多参考资料，请参阅另一个相关问题的答案

这里的关键是，幸运的是，不必费力计算这些量，因为libav提供了一个函数，可以通过知道上述数据来计算与数据包相关的正确时间戳：

av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)

由于这些考虑，我终于能够生成一个合理的输出容器，并且基本上与使用命令行工具获得的压缩率相同，这是在更深入地研究如何正确设置格式头和尾部以及时间戳之前剩下的两个问题。

感谢您的出色工作，@ksb496！

一个小改进：

c=avcodec_alloc_context3(codec);

应该更好地写成：

c = stream->codec;

以避免内存泄漏。

如果你不介意的话，我已经将完整的可部署库上传到GitHub上：https://github.com/apc-llc/moviemaker-cpp.git

多亏了ksb496，我完成了这项任务，但在我的情况下，我需要更改一些代码才能按预期工作。我想也许它可以帮助其他人，所以我决定分享（延迟两年:D）。

我有一个RGB缓冲区，里面装满了directshow采样采集器，我需要从中拍摄视频。从给定答案转换成RGB到YUV对我来说没有效果。我是这样做的：

int stride = m_width * 3;
int index = 0;
for (int y = 0; y < m_height; y++) {
    for (int x = 0; x < stride; x++) {
        int j = (size - ((y + 1)*stride)) + x;
        m_rgbpic->data[0][j] = data[index];
        ++index;
    }
}

这里的data变量是我的RGB缓冲区（简单的BYTE*），size是以字节为单位的data缓冲区大小。从左下到右上开始填充RGB AVFrame。

另一件事是我的FFMPEG版本没有av_packet_rescale_ts功能。这是最新版本，但FFMPEG文档并没有说这个功能在任何地方都不推荐使用，我想这可能只是windows的情况。不管怎样，我使用av_rescale_q来代替它做同样的工作。像这样：

AVPacket pkt;
pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);

最后一件事，使用这种格式转换，我需要将swsContext更改为BGR24，而不是像这样的RGB24：

m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
        AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);

avcodec_encode_video2&avcodec_encode_audio2似乎已被弃用。当前版本（4.2）的FFmpeg具有新的API：avcodec_send_frame&avcodec_receive_packet。