SSE _mm_load_ps导致分段错误

SSE _mm_load_ps causing segmentation faults

本文关键字：分段错误 ps mm load SSE 更新时间：2023-10-16

所以我在这个学习使用 SSE 内联函数编程的玩具示例时遇到了麻烦。我在这里的其他线程上读到，有时 _mm_load_ps 函数的分割错误是由于没有正确对齐而导致的，但我认为它应该通过我所做的属性（（aligned（16））））来解决。此外，当我注释掉代码中的第 23 行或第 24 行（或两者）时，问题消失了，但显然这使得代码不起作用。

#include <iostream>
using namespace std;
int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, *m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);
        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, *m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        m_result++;
        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        *m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, *m_result); 

        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

第 23 行是 arr1 = _mm_load_ps（temp1+4）。对我来说很奇怪，我可以做一个或另一个，但不能同时做两个。任何帮助将不胜感激，谢谢！

你的问题是你声明了一个指针__m128 *m_result但你从来没有为它分配任何空间。稍后您还会执行指向另一个尚未分配的内存地址m_result++。这里没有理由使用指针。

#include <xmmintrin.h>                 // SSE
#include <iostream>
using namespace std;
int main()
{
        float temp1[] __attribute__((__aligned__(16))) = {1.1,1.2,1.3,14.5,3.1,5.2,2.3,3.4};
        float temp2[] __attribute__((__aligned__(16))) = {1.2,2.3,3.4,3.5,1.2,2.3,4.2,2.2};
        float temp3[8];
        __m128 m, m_result;
        __m128 arr1 = _mm_load_ps(temp1);
        __m128 arr2 = _mm_load_ps(temp2);
        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m, m); 
        _mm_store_ps(temp3, m_result); 
        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        arr1 = _mm_load_ps(temp1+4);
        arr2 = _mm_load_ps(temp2+4);
        m = _mm_mul_ps(arr1, arr2);
        m_result = _mm_add_ps(m,m);
        _mm_store_ps(temp3, m_result); 

        for(int i = 0; i < 4; i++)
        {   
            cout << temp3[i] << endl;
        }   
        return 0;
}

（1） m_result只是一个狂野的指针：

     __m128 m, *m_result;

将所有出现的*m_result更改为m_result并删除m_result++;。（m_result只是一个临时向量变量，随后要存储到 temp3 中）。

（2）您的两个商店可能未对齐，因为temp3无法保证对齐 - 任一更改：

    float temp3[8];

自：

    float temp3[8] __attribute__((__aligned__(16)));

或使用_mm_storeu_ps：

    _mm_storeu_ps(temp3, m_result); 
            ^^^