C++中的快速图像(或矩阵）转置实现

Fast image (or matrix) transpose implementation in C++

本文关键字：转置实现图像 C++ 更新时间：2023-10-16

这篇文章讨论了如何使用OpenCV转置图像，在这里我想更进一步：假设图像是灰度图像，使用C++转置它（或矩阵）的最快方法是什么？我的解决方案如下：

        // image data is stored in an image buffer image*buffer_
    unsigned char *mem = (unsigned char *) malloc(image.bufferSize_);
    int height = image.Height();
    int width = image.Width();
    for(int i=0; i<height; i++)
    {
        unsigned char *ptr =image.buffer_+i*width;
        for(int j=0; j<width; j++)
            *(mem+j*height+i) = *(ptr+j);
    }

    memcpy(image.buffer_,mem,image.bufferSize_);
    free(mem);

上面的代码上面的一些解释：我们创建一个包含基本图像信息以及图像像素的图像对象（以image.buffer_为单位）。当图像像素存储在image.buffer_中时，我们假设图像像素是逐行保存的。关于进一步改进上述代码的任何想法？

在不接触malloc/free部分的情况下，复制部分可以如下所示：

    size_t len = image.bufferSize_,
           len1 = len - 1;
    unsigned char *src = image.buffer_,
                  *dest = mem,
                  *end = dest + len;
    for(size_t i = 0; i < len; i++)
    {
        *dest++ = *src;  // dest moves to next row
        src += height;   // src moves to next column
        // src wraps around and moves to next row
        if (src > end) src -= len1;
    }

这等效于具有按列的目标迭代器和按行的源迭代器。

在没有实际测试的情况下，我觉得这会更快：它在内部循环中有 3 个偏移计算操作，而在您的版本中有 4 个操作（两个版本中都有 2 个取消引用操作）。

编辑

还有一个改进，一个更正：

    //...
    unsigned char *src = image.buffer_,
                  *src_end = src + len,
                  *dest = mem,
                  *dest_end = dest + len;
    while (dest != dest_end)
    {
        *dest++ = *src;  // dest moves to next row
        src += height;   // src moves to next column
        // src wraps around and moves to next row
        if (src > src_end) src -= len1;
    }

这为每次迭代节省了一个操作（i++在for循环中）。此外，src与之前错误的end进行了比较。