成功启用 -fno-finite-math-only on NaN 删除方法

Successfully enabling -fno-finite-math-only on NaN removal method

本文关键字：删除方法 NaN on 启用 -fno-finite-math-only 成功更新时间：2023-10-16

在发现一个错误时，当运行我的代码的优化版本（在 g++ 4.8.2 和 4.9.3 中编译）时，一切都变成了 NaN 秒，我发现问题是-Ofast选项，特别是它包含的-ffinite-math-only标志。

代码的一部分涉及使用 fscanf 从FILE*读取浮点数，然后用数值替换所有NaN。然而，正如预期的那样，-ffinite-math-only启动并删除了这些检查，从而留下了NaN。

在尝试解决此问题时，我偶然发现了这一点，建议添加-fno-finite-math-only作为方法属性以禁用对特定方法的优化。下面说明了问题和尝试的修复（实际上并没有修复它）：

#include <cstdio>
#include <cmath>
__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}
int main(void){
    const size_t cnt = 10;
    float val[cnt];
    for(int i = 0; i < cnt; i++) scanf("%f", val + i);
    replaceNaN(val, cnt, -1.0f);
    for(int i = 0; i < cnt; i++) printf("%f ", val[i]);
    return 0;
}

如果使用echo 1 2 3 4 5 6 7 8 nan 10 | (g++ -ffinite-math-only test.cpp -o test && ./test)编译/运行，代码不会按预期运行，具体来说，它输出一个nan（应该被-1.0f替换）——如果省略-ffinite-math-only标志，它的行为很好。这不应该行吗？我是否缺少 gcc 中属性的语法，或者这是上述"与此相关的某些版本的 GCC 存在一些麻烦"之一（来自链接的 SO 问题）

我知道的一些解决方案，但宁愿更干净/更便携：

使用 -fno-finite-math-only（我的 interrim 解决方案）编译代码：我怀疑这种优化在程序其余部分的上下文中可能非常有用;
手动查找输入流中的字符串"nan"，然后替换其中的值（输入读取器位于库的不相关部分，因此设计不佳，无法将此测试包含在该部分）。
假设一个特定的浮点架构并制作我自己的isNaN：我可能会这样做，但它有点黑客和不可移植。
使用没有 -ffinite-math-only 标志的单独编译的程序预过滤数据，然后将其输入主程序：维护两个二进制文件并让它们相互通信的复杂性是不值得的。

编辑：正如接受的答案中所说，这似乎是旧版本的g++中的编译器"错误"，例如4.82和4.9.3，在较新版本（例如5.1和6.1.1）中已修复。

如果由于某种原因更新编译器不是一个相当简单的选择（例如：没有root访问权限），或者将此属性添加到单个函数仍然不能完全解决NaN检查问题，那么另一种解决方案，如果你可以确定代码将始终在IEEE754浮点环境中运行，是手动检查浮点数的位是否有NaN签名。

公认的答案建议使用位字段执行此操作，但是，编译器将元素放置在位字段中的顺序是非标准的，实际上，旧版本和新版本之间的变化 g++ ，甚至拒绝遵守旧版本中所需的位置（4.8.2和4.9.3，总是将尾数放在首位），无论它们在代码中出现的顺序如何。

但是，使用位操作的解决方案保证适用于所有IEEE754兼容的编译器。下面是我的这种实现，我最终用它来解决我的问题。它检查IEEE754合规性，我已经扩展了它以允许双精度，以及其他更常规的浮点位操作。

#include <limits> // IEEE754 compliance test
#include <type_traits> // enable_if
template<
    typename T, 
    typename = typename std::enable_if<std::is_floating_point<T>::value>::type,
    typename = typename std::enable_if<std::numeric_limits<T>::is_iec559>::type,
    typename u_t = typename std::conditional<std::is_same<T, float>::value, uint32_t, uint64_t>::type
>
struct IEEE754 {
    enum class WIDTH : size_t {
        SIGN = 1, 
        EXPONENT = std::is_same<T, float>::value ? 8 : 11,
        MANTISSA = std::is_same<T, float>::value ? 23 : 52
    };
    enum class MASK : u_t {
        SIGN = (u_t)1 << (sizeof(u_t) * 8 - 1),
        EXPONENT = ((~(u_t)0) << (size_t)WIDTH::MANTISSA) ^ (u_t)MASK::SIGN,
        MANTISSA = (~(u_t)0) >> ((size_t)WIDTH::SIGN + (size_t)WIDTH::EXPONENT)
    };
    union {
        T f;
        u_t u;
    };
    IEEE754(T f) : f(f) {}
    inline u_t sign() const { return u & (u_t)MASK::SIGN >> ((size_t)WIDTH::EXPONENT + (size_t)WIDTH::MANTISSA); }
    inline u_t exponent() const { return u & (u_t)MASK::EXPONENT >> (size_t)WIDTH::MANTISSA; }
    inline u_t mantissa() const { return u & (u_t)MASK::MANTISSA; }
    inline bool isNan() const {
        return (mantissa() != 0) && ((u & ((u_t)MASK::EXPONENT)) == (u_t)MASK::EXPONENT);
    }
};
template<typename T>
inline IEEE754<T> toIEEE754(T val) { return IEEE754<T>(val); }

replaceNaN函数现在变为：

void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) 
        if (toIEEE754(arr[i]).isNan()) arr[i] = newValue;
}

检查这些函数的程序集会发现，正如预期的那样，所有掩码都成为编译时常量，从而产生以下（看似）有效的代码：

# In loop of replaceNaN
movl    (%rcx), %eax       # eax = arr[i] 
testl   $8388607, %eax     # Check if mantissa is empty
je  .L3                    # If it is, it's not a nan (it's inf), continue loop
andl    $2139095040, %eax  # Mask leaves only exponent
cmpl    $2139095040, %eax  # Test if exponent is all 1s
jne .L3                    # If it isn't, it's not a nan, so continue loop

这比工作位场解决方案（

无移位）少一条指令，并且使用相同数量的寄存器（尽管很容易说仅此一项就使其更有效率，但还有其他问题，例如流水线，可能会使一种解决方案比另一种解决方案更有效或更低）。

对我来说看起来像

一个编译器错误。在 GCC 4.9.2 之前，该属性将被完全忽略。GCC 5.1 及更高版本要注意它。也许是时候升级您的编译器了？

__attribute__((optimize("-fno-finite-math-only"))) 
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}

在GCC 4.9.2上使用-ffinite-math-only编译：

replaceNaN(float*, int, float):
        rep ret

但是在GCC 5.1上使用完全相同的设置：

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L26
        sub     rsp, 8
        call    std::isnan(float) [clone .isra.0]
        test    al, al
        je      .L2
        mov     rax, rdi
        and     eax, 15
        shr     rax, 2
        neg     rax
        and     eax, 3
        cmp     eax, esi
        cmova   eax, esi
        cmp     esi, 6
        jg      .L28
        mov     eax, esi
.L5:
        cmp     eax, 1
        movss   DWORD PTR [rdi], xmm0
        je      .L16
        cmp     eax, 2
        movss   DWORD PTR [rdi+4], xmm0
        je      .L17
        cmp     eax, 3
        movss   DWORD PTR [rdi+8], xmm0
        je      .L18
        cmp     eax, 4
        movss   DWORD PTR [rdi+12], xmm0
        je      .L19
        cmp     eax, 5
        movss   DWORD PTR [rdi+16], xmm0
        je      .L20
        movss   DWORD PTR [rdi+20], xmm0
        mov     edx, 6
.L7:
        cmp     esi, eax
        je      .L2
.L6:
        mov     r9d, esi
        lea     r8d, [rsi-1]
        mov     r11d, eax
        sub     r9d, eax
        lea     ecx, [r9-4]
        sub     r8d, eax
        shr     ecx, 2
        add     ecx, 1
        cmp     r8d, 2
        lea     r10d, [0+rcx*4]
        jbe     .L9
        movaps  xmm1, xmm0
        lea     r8, [rdi+r11*4]
        xor     eax, eax
        shufps  xmm1, xmm1, 0
.L11:
        add     eax, 1
        add     r8, 16
        movaps  XMMWORD PTR [r8-16], xmm1
        cmp     ecx, eax
        ja      .L11
        add     edx, r10d
        cmp     r9d, r10d
        je      .L2
.L9:
        movsx   rax, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        lea     eax, [rdx+1]
        cmp     eax, esi
        jge     .L2
        add     edx, 2
        cdqe
        cmp     esi, edx
        movss   DWORD PTR [rdi+rax*4], xmm0
        jle     .L2
        movsx   rdx, edx
        movss   DWORD PTR [rdi+rdx*4], xmm0
.L2:
        add     rsp, 8
.L26:
        rep ret
.L28:
        test    eax, eax
        jne     .L5
        xor     edx, edx
        jmp     .L6
.L20:
        mov     edx, 5
        jmp     .L7
.L19:
        mov     edx, 4
        jmp     .L7
.L18:
        mov     edx, 3
        jmp     .L7
.L17:
        mov     edx, 2
        jmp     .L7
.L16:
        mov     edx, 1
        jmp     .L7

GCC 6.1 上的输出类似，尽管不完全相同。

将属性替换为

#pragma GCC push_options
#pragma GCC optimize ("-fno-finite-math-only")
void replaceNaN(float * arr, int size, float newValue){
    for(int i = 0; i < size; i++) if (std::isnan(arr[i])) arr[i] = newValue;
}
#pragma GCC pop_options

绝对没有区别，所以这不仅仅是忽略属性的问题。这些旧版本的编译器显然不支持在函数级粒度上控制浮点优化行为。

但是请注意，在 GCC 5.1 及更高版本上生成的代码仍然比在没有 -ffinite-math-only 开关的情况下编译函数要差得多：

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L1
        lea     eax, [rsi-1]
        lea     rax, [rdi+4+rax*4]
.L5:
        movss   xmm1, DWORD PTR [rdi]
        ucomiss xmm1, xmm1
        jnp     .L6
        movss   DWORD PTR [rdi], xmm0
.L6:
        add     rdi, 4
        cmp     rdi, rax
        jne     .L5
        rep ret
.L1:
        rep ret

我不知道为什么会有这样的差异。有些东西严重地使编译器脱离了游戏;这比完全禁用优化时获得的代码还要糟糕。如果我不得不猜测，我会推测这是std::isnan的实现。如果这种replaceNaN方法不是对速度至关重要的，那么它可能无关紧要。如果需要重复分析文件中的值，则可能希望有一个相当有效的实现。

就个人而言，我会编写自己的非可移植std::isnan实现。IEEE 754格式都有很好的文档记录，假设你彻底测试和注释代码，我看不出这有什么害处，除非你绝对需要代码可以移植到所有不同的架构。它会把纯粹主义者推上墙，但使用非标准选项（如 -ffinite-math-only）也是如此。对于单精度浮点数，如下所示：

bool my_isnan(float value)
{
  union IEEE754_Single
  {
    float f;
    struct
    {
    #if BIG_ENDIAN
        uint32_t sign     : 1;
        uint32_t exponent : 8;
        uint32_t mantissa : 23;
    #else
        uint32_t mantissa : 23;
        uint32_t exponent : 8;
        uint32_t sign     : 1;
    #endif
    } bits;
  } u = { value };
  // In the IEEE 754 representation, a float is NaN when
  // the mantissa is non-zero, and the exponent is all ones
  // (2^8 - 1 == 255).
  return (u.bits.mantissa != 0) && (u.bits.exponent == 255);
}

现在，不需要注释，只需使用 my_isnan 而不是 std::isnan .在启用 -ffinite-math-only 的情况下进行编译时，将生成以下目标代码：

replaceNaN(float*, int, float):
        test    esi, esi
        jle     .L6
        lea     eax, [rsi-1]
        lea     rdx, [rdi+4+rax*4]
.L13:
        mov     eax, DWORD PTR [rdi]     ; get original floating-point value
        test    eax, 8388607             ; test if mantissa != 0
        je      .L9
        shr     eax, 16                  ; test if exponent has all bits set
        and     ax, 32640
        cmp     ax, 32640
        jne     .L9
        movss   DWORD PTR [rdi], xmm0    ; set newValue if original was NaN
.L9:
        add     rdi, 4
        cmp     rdx, rdi
        jne     .L13
        rep ret
.L6:
        rep ret

NaN 检查比简单的ucomiss稍微复杂一些，然后测试奇偶校验标志，但只要编译器遵守 IEEE 754 标准，就可以保证正确。这适用于所有版本的 GCC 和任何其他编译器。