访问违规_mm_store_si128 sse Interins

access violation _mm_store_si128 SSE Intrinsics

本文关键字:si128 sse Interins store mm 访问      更新时间:2023-10-16

我想在8位灰色图像中创建垂直梯度的直方图。可以指定计算梯度的垂直距离。我已经设法使用固有的速度加快了代码的另一部分,但是在这里它不起作用。如果_MM_STORE_SI128评论了,则无例子运行。当没有评论时,我会违反访问。

这里出了什么问题?

#define _mm_absdiff_epu8(a,b) _mm_adds_epu8(_mm_subs_epu8(a, b), _mm_subs_epu8(b, a)) //from opencv
void CreateAbsDiffHistogramUnmanaged(void* source, unsigned int sourcestride, unsigned int height, unsigned int verticalDistance, unsigned int histogram[])
{
unsigned int xcount = sourcestride / 16;
__m128i absdiffData;
unsigned char* bytes = (unsigned char*) _aligned_malloc(16, 16);
__m128i* absdiffresult = (__m128i*) bytes;
__m128i* sourceM = (__m128i*) source;
__m128i* sourceVOffset = (__m128i*)source + verticalDistance * sourcestride;
for (unsigned int y = 0; y < (height - verticalDistance); y++)
{
    for (unsigned int x = 0; x < xcount; x++, ++sourceM, ++sourceVOffset)
    {
        absdiffData = _mm_absdiff_epu8(*sourceM, *sourceVOffset);
        _mm_store_si128(absdiffresult, absdiffData);
        //unroll loop
        histogram[bytes[0]]++;
        histogram[bytes[1]]++;
        histogram[bytes[2]]++;
        histogram[bytes[3]]++;
        histogram[bytes[4]]++;
        histogram[bytes[5]]++;
        histogram[bytes[6]]++;
        histogram[bytes[7]]++;
        histogram[bytes[8]]++;
        histogram[bytes[9]]++;
        histogram[bytes[10]]++;
        histogram[bytes[11]]++;
        histogram[bytes[12]]++;
        histogram[bytes[13]]++;
        histogram[bytes[14]]++;
        histogram[bytes[15]]++;
    }
}
_aligned_free(bytes);
}

加载时您的功能崩溃了,因为输入数据未正确对齐。为了解决此问题,您必须从:

中更改代码
absdiffData = _mm_absdiff_epu8(*sourceM, *sourceVOffset);

to:

absdiffData = _mm_absdiff_epu8(_mm_loadu_si128(sourceM), _mm_loadu_si128(sourceVOffset));

在这里我使用未对齐的加载。

P.S。我在SIMD库中实现了类似的功能(SimdAbsSecondDerivativeHistogram)。它具有SSE2,AVX2,NEON和ALTIVEC实现。我希望它对您有帮助。

P.P.S。我也强烈建议检查这一行:

__m128i* sourceVOffset = (__m128i*)source + verticalDistance * sourcestride);

它可能会导致崩溃(输入数组边界之外的内存访问)。也许您想到了:

__m128i* sourceVOffset = (__m128i*)((char*)source + verticalDistance * sourcestride);