C++中SSE的内存对齐,_aligned_malloc等效
Memory alignment for SSE in C++, _aligned_malloc equivalent?
我想知道如何将这段C代码转换为用于内存对齐的C++。
float *pResult = (float*) _aligned_malloc(length * sizeof(float), 16);
我确实看了一下,然后试了一下float *pResult = (float*) __attribute__((aligned(16)));
和这个
float *pResult = __attribute__((aligned(16)));
但两者都给出了相似的错误。
error: expected primary-expression before '__attribute__'|
error: expected ',' or ';' before '__attribute__'|
完整代码
#include "stdafx.h"
#include <xmmintrin.h> // Need this for SSE compiler intrinsics
#include <math.h> // Needed for sqrt in CPU-only version
#include "stdio.h"
int main(int argc, char* argv[])
{
printf("Starting calculation...n");
const int length = 64000;
// We will be calculating Y = Sin(x) / x, for x = 1->64000
// If you do not properly align your data for SSE instructions, you may take a huge performance hit.
float *pResult = (float*) __attribute__((aligned(16))); // align to 16-byte for SSE
__m128 x;
__m128 xDelta = _mm_set1_ps(4.0f); // Set the xDelta to (4,4,4,4)
__m128 *pResultSSE = (__m128*) pResult;
const int SSELength = length / 4;
for (int stress = 0; stress < 100000; stress++) // lots of stress loops so we can easily use a stopwatch
{
#define TIME_SSE // Define this if you want to run with SSE
#ifdef TIME_SSE
x = _mm_set_ps(4.0f, 3.0f, 2.0f, 1.0f); // Set the initial values of x to (4,3,2,1)
for (int i=0; i < SSELength; i++)
{
__m128 xSqrt = _mm_sqrt_ps(x);
// Note! Division is slow. It's actually faster to take the reciprocal of a number and multiply
// Also note that Division is more accurate than taking the reciprocal and multiplying
#define USE_DIVISION_METHOD
#ifdef USE_FAST_METHOD
__m128 xRecip = _mm_rcp_ps(x);
pResultSSE[i] = _mm_mul_ps(xRecip, xSqrt);
#endif //USE_FAST_METHOD
#ifdef USE_DIVISION_METHOD
pResultSSE[i] = _mm_div_ps(xSqrt, x);
#endif // USE_DIVISION_METHOD
// NOTE! Sometimes, the order in which things are done in SSE may seem reversed.
// When the command above executes, the four floating elements are actually flipped around
// We have already compensated for that flipping by setting the initial x vector to (4,3,2,1) instead of (1,2,3,4)
x = _mm_add_ps(x, xDelta); // Advance x to the next set of numbers
}
#endif // TIME_SSE
#ifndef TIME_SSE
float xFloat = 1.0f;
for (int i=0 ; i < length; i++)
{
pResult[i] = sqrt(xFloat) / xFloat; // Even though division is slow, there are no intrinsic functions like there are in SSE
xFloat += 1.0f;
}
#endif // !TIME_SSE
}
// To prove that the program actually worked
for (int i=0; i < 20; i++)
{
printf("Result[%d] = %fn", i, pResult[i]);
}
// Results for my particular system
// 23.75 seconds for SSE with reciprocal/multiplication method
// 38.5 seconds for SSE with division method
// 301.5 seconds for CPU
return 0;
}
对于C++11,您可以使用以下内容:
struct aligned_float
{
alignas(16) float f[4];
};
static_assert(sizeof(aligned_float) == 4 * sizeof(float), "padding issue");
int main()
{
const int length = 64000;
std::vector<aligned_float> pResult(length / sizeof(aligned_float));
return 0;
}
我所知道的解决这个问题的唯一可移植的方法是使用一个包装器,它实际上分配了超出所需的内容,并屏蔽了较低的部分,以确保它返回的内容符合有效的对齐。
请参阅此处:
http://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html
Glibc提供aligned_alloc()。
相关文章:
- 如果没有malloc,链表实现将失败
- malloc() 可能出现内存泄漏
- Cuda C++:设备上的Malloc类,并用来自主机的数据填充它
- 当我尝试加载内核模块时,如何修复C++中的这个 malloc() 错误?
- 在C++中创建队列 - 什么是 malloc 错误?
- 如何在 malloc 内存中初始化非 POD 数据
- 使用 malloc() 时出现意外大小
- C++:在被本地字符串捕获后释放或销毁 malloc'd char *?
- 错误:malloc:对象 0x7f9edf504080 的 *** 错误:未分配正在释放的指针
- 将 malloc 替换为数组
- SIGSEGV on Boost UDP 套接字关闭 - tcache_get at malloc.c.
- Malloc 在使用线程并行化 SSH 调用时存在问题
- 如何将新更改为 malloc?
- 编译器如何实现__declspec(align(x)) / __attribute__(aligned(x)))?
- 将 malloc 转换为新的正确方法
- Malloc void return char 数组有时不起作用(Terry Davis 对 C++);
- 如何通过 malloc 为队列数组分配内存?
- 正在调试 malloc():新内存损坏
- 我怎样才能代替使用新的使用malloc翻译
- C++程序错误:malloc():内存损坏