FFT使用“英特尔MKL”和“英特尔IPP”

FFT using Intel MKL and Intel IPP

本文关键字：英特尔 IPP MKL FFT 使用更新时间：2023-10-16

我有一个大小为1024*128*20的复杂数据。我需要找到128*20块的1024点FFT。我计划使用"英特尔MKL"或"英特尔IPP"查找相同的内容。是否可以使用"英特尔MKL"或"IPP"对代码进行并行处理？MKL和IPP中哪一个在最短计算时间方面会更好？

我建议您阅读：https://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft/它提供了一个很好的比较，可以更容易地决定哪一个更适合您的用例。

IPP和MKL都可以完成这项工作，但计算时间较短可能取决于您的硬件，因为它们的优化方式不同，例如IPP仅适用于FFT的2倍阵列，而MKL可能更通用（根据文章）。

（很抱歉碰到了一个"旧"问题，但尚未选择答案，该问题仍然相关）

我认为它们的性能与英特尔开发的相同。我更喜欢MKL，因为它有更多的用户。

MKL和IPP都支持并行FFT。然而，我建议你在更高级别上使用并行性，因为你有很多FFT块要做。对于每个1024-FFT，你可以使用MKL中的顺序版本。

Intel建议为具有相同参数的多个FFT提供解决方案：https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/fourier-transform-functions/fft-functions/configuration-settings/dfti-number-of-transforms.html

重点是，您将整个数据集提供给它，它负责并行化。

不过，要注意共轭偶对称。

这里有一个最小的例子：

#include <mkl.h>
#include <vector>
#include <complex>
int main(void)
{
    int inputLength = 1024;
    int numOfTransforms = 8;
    std::vector<double> inputData(numOfTransforms * inputLength, 0.0);
    std::vector<std::complex<double>> spectrum(inputLength * numOfTransforms);
    // ...
    // This is where you fill your matrix with useful data
    // ...
    file.read(reinterpret_cast<char *>(inputData.data()), sizeof(double) * numOfTransforms * inputLength);
    // At this point, input data contains 8 arrays in one, row-major.
    DFTI_DESCRIPTOR_HANDLE fftHandle;
    // Creating a handle with double precision, real input, and along 1st dimension of length inputLength
    auto status = DftiCreateDescriptor(&fftHandle, DFTI_DOUBLE, DFTI_REAL, 1, inputLength);
    status = DftiSetValue(fftHandle, DFTI_NUMBER_OF_TRANSFORMS, numOfTransforms); // nu
    status = DftiSetValue(fftHandle, DFTI_INPUT_DISTANCE, inputLength);
    status = DftiSetValue(fftHandle, DFTI_OUTPUT_DISTANCE, inputLength);
    status = DftiSetValue(fftHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
    // this is important, as the default option is DFTI_COMPLEX_REAL, which is deprecated.
    status = DftiSetValue(fftHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
    status = DftiCommitDescriptor(fftHandle);
    DftiComputeForward(fftHandle, inputData.data(), spectrum.data());
    DftiFreeDescriptor(&fftHandle);
    return 0;
}