用于将YUYV拆分为平面的NEON优化

NEON Optimization for Splitting YUYV into Planes

本文关键字：NEON 优化平面 YUYV 拆分用于更新时间：2023-10-16

我想学习如何使用NEON将YUYV拆分为Y、U和V平面，以便稍后将数据作为OpenGL纹理提供给GPU。

目前，我在C++中这样做：

/**
* TopOpenGL splitYuvPlanes()
* Purpose: splitYuvPlanes - Split YUYV into 3 arrays - one for each component
*
* @param data - input data
* @param size - input data size
* @param y - array to store output channels
* @param u - array to store output channels
* @param v - array to store output channels
*/
void TopOpenGL::splitYuvPlanes(unsigned char *data, int size, unsigned char *y, unsigned char *u, unsigned char *v)
{
    // This case takes RGBA -> BGRA
//    __asm__ volatile(
//                "mov r3, r3, lsr #3n"           /* Divide number of pixels by 8 because we process them 8 at a time */
//                "loopRGBA:n"
//                "vld4.8 {d0-d3}, [r1]!n"        /* Load 8 pixels into d0 through d2. d0 = R[0-7], d1 = G[0-7], d2 = B[0-7], d3 = A[0-7] */
//                "subs r3, r3, #1n"              /* Decrement the loop counter */
//                "vswp d0, d2n"                  /* Swap R and B channels */
//                "vst4.8 {d0-d3}, [r2]!n"        /* Store the RGBA into destination 8 pixels at a time */
//                "bgt loopRGBAn"
//                "bx lrn"
//                );
    for ( int c = 0 ; c < ( size - 4 ) ; c+=4 ) {
        *y = *data; // Y0
        data++;
        *u = *data; // U0
        u++;
        *u = *data; // U0
        data++;
        y++;
        *y = *data; // Y1
        data++;
        *v = *data; // V0
        v++;
        *v = *data; // V0
        data++;
        y++;
        u++;
        v++;
    }
}

如何使用NEON将其拆分为char*y、char*u和char*v？非常感谢。

我找到了这个博客，但它并不是我想要的。http://blog.lumberlabs.com/2011/04/efficiently-splitting-cbcr-plane-with.html

以下代码实现了将YUYV帧拆分为Y、U和V平面的目标。

/// This structure is passed to ARM Assembly code
/// to split the YUV frame into seperate planes for
/// OpenGL Consumption
typedef struct {
    uchar *input_data;
    uint32_t input_size;
    uchar *y_plane;
    uchar *u_plane;
    uchar *v_plane;
} yuvSplitStruct;
void TopOpenGL::splitYuvPlanes(yuvSplitStruct *yuvStruct)
{
    __asm__ volatile(
                "PUSH {r4}n"                            /* Save callee-save registers R4 and R5 on the stack */
                "PUSH {r5}n"                            /* r1 is the pointer to the input structure ( r0 is 'this' because c++ ) */
                "ldr r0 , [r1]n"                        /* reuse r0 scratch register for the address of our frame input */
                "ldr r2 , [r1, #4]n"                    /* use r2 scratch register to store the size in bytes of the YUYV frame */
                "ldr r3 , [r1, #8]n"                    /* use r3 scratch register to store the destination Y plane address */
                "ldr r4 , [r1, #12]n"                   /* use r4 register to store the destination U plane address */
                "ldr r5 , [r1, #16]n"                   /* use r5 register to store the destination V plane address */
                    "mov r2, r2, lsr #5n"               /* Divide number of bytes by 32 because we process 16 pixels at a time */
                    "loopYUYV:n"
                        "vld4.8 {d0-d3}, [r0]!n"        /* Load 8 YUYV elements from our frame into d0-d3, increment frame pointer */
                        "vst2.8 {d0,d2}, [r3]!n"        /* Store both Y elements into destination y plane, increment plane pointer */
                        "vmov.F64 d0, d1n"              /* Duplicate U value */
                        "vst2.8 {d0,d1}, [r4]!n"        /* Store both U elements into destination u plane, increment plane pointer */
                        "vmov.F64 d1, d3n"              /* Duplicate V value */
                        "vst2.8 {d1,d3}, [r5]!n"        /* Store both V elements into destination v plane, increment plane pointer */
                        "subs r2, r2, #1n"              /* Decrement the loop counter */
                    "bgt loopYUYVn"                     /* Loop until entire frame is processed */
                "POP {r5}n"                             /* Restore callee-save registers */
                "POP {r4}n"
    );
}