如何避免Expr中的溢出.A B C D

How to avoid overflow in expr. A * B - C * D

本文关键字：溢出何避免 Expr 更新时间：2023-10-16

我需要计算一个看起来像： A*B - C*D，它们的类型是：signed long long int A, B, C, D;每个数字可能真的很大（不会溢出其类型）。尽管A*B可能会导致溢出，但同时表达A*B - C*D可能很小。如何正确计算它？

例如： MAX * MAX - (MAX - 1) * (MAX + 1) == 1，其中 MAX = LLONG_MAX - n和n-一些自然的数字。

这似乎太微不足道了。但是A*B是可能溢出的。

您可以执行以下操作，而不会丢失精度

A*B - C*D = A(D+E) - (A+F)D
          = AD + AE - AD - DF
          = AE - DF
             ^smaller quantities E & F
E = B - D (hence, far smaller than B)
F = C - A (hence, far smaller than C)

这个分解可以是进一步完成。
正如@Gian指出的那样，如果类型未签名长，则可能需要在减法操作期间注意。

例如，对于您在问题中的情况，它只需要一个迭代，

 MAX * MAX - (MAX - 1) * (MAX + 1)
  A     B       C           D
E = B - D = -1
F = C - A = -1
AE - DF = {MAX * -1} - {(MAX + 1) * -1} = -MAX + MAX + 1 = 1

最简单，最通用的解决方案是使用无法溢出的表示，要么使用长整数库（例如http://gmplib.org/）或表示使用结构或数组并实现一种长乘法（即将每个数字分为两个32位半部分并执行以下操作：

(R1 + R2 * 2^32 + R3 * 2^64 + R4 * 2^96) = R = A*B = (A1 + A2 * 2^32) * (B1 + B2 * 2^32) 
R1 = (A1*B1) % 2^32
R2 = ((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) % 2^32
R3 = (((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) %2^32
R4 = ((((A1*B1) / 2^32 + (A1*B2) % 2^32 + (A2*B1) % 2^32) / 2^32 + (A1*B2) / 2^32 + (A2*B1) / 2^32 + (A2*B2) % 2^32) / 2^32) + (A2*B2) / 2^32

假设最终结果适合64位，您实际上并不需要大多数R3位，而R4

请注意，这不是标准由于它依赖于环绕式签名的跨流。（GCC具有启用此功能的编译器标志。）

但是，如果您只执行long long中的所有计算，则直接应用公式的结果：
(A * B - C * D)只要正确的结果适合long long。

这是一个工作，仅依赖于施放未签名整数的实施定义的行为来签名整数。但这可以预期在当今几乎每个系统上都可以使用。

(long long)((unsigned long long)A * B - (unsigned long long)C * D)

这将输入投入到unsigned long long中，其中保证按标准包裹溢出行为。最后一个定义的部分是末尾签名的整数，但今天几乎可以在所有环境中工作。

如果您需要更多的pedantic解决方案，我认为您必须使用"长算术"

这应该有效（我认为）：

signed long long int a = 0x7ffffffffffffffd;
signed long long int b = 0x7ffffffffffffffd;
signed long long int c = 0x7ffffffffffffffc;
signed long long int d = 0x7ffffffffffffffe;
signed long long int bd = b / d;
signed long long int bdmod = b % d;
signed long long int ca = c / a;
signed long long int camod = c % a;
signed long long int x = (bd - ca) * a * d - (camod * d - bdmod * a);

这是我的派生：

x = a * b - c * d
x / (a * d) = (a * b - c * d) / (a * d)
x / (a * d) = b / d - c / a
now, the integer/mod stuff:
x / (a * d) = (b / d + ( b % d ) / d) - (c / a + ( c % a ) / a )
x / (a * d) = (b / d - c / a) - ( ( c % a ) / a - ( b % d ) / d)
x = (b / d - c / a) * a * d - ( ( c % a ) * d - ( b % d ) * a)

E = max(A,B,C,D)
A1 = A -E;
B1 = B -E;
C1 = C -E;
D1 = D -E;

然后

A*B - C*D = (A1+E)*(B1+E)-(C1+E)(D1+E) = (A1+B1-C1-D1)*E + A1*B1 -C1*D1

您可以考虑为所有值计算一个最大的共同因素，然后在执行算术操作之前将其除以该因素，然后再次乘以乘法。但是，假设存在这样的因素，但是（例如，如果A，B，C和D恰好是相对典型的，它们将没有共同的因素）。

同样，您可以考虑在日志尺度上工作，但这将有点可怕，但要遵守数值精度。

如果结果拟合了长int，然后表达式a*b-c*d可以执行算术mod 2^64，并给出正确的结果。问题是要知道结果是否适合长时间的int。要检测到这一点，您可以使用Doubles使用以下技巧：

if( abs( (double)A*B - (double)C*D ) > MAX_LLONG ) 
    Overflow
else 
    return A*B-C*D;

这种方法的问题在于，您受到双打的精确度（54 bits？），因此您需要将产品a*b和c*d限制为63 54位较少的）。

您可以在数组中写入每个数字，每个元素都是数字，并将计算作为多项式进行计算。以所得的多项式为阵列，并通过将数组的每个元素乘以10到阵列中位置的幂来计算结果（第一个位置是最大的位置，最后一个为零）。

>

数字123可以表示为：

123 = 100 * 1 + 10 * 2 + 3

您只创建一个数组[1 2 3]。

您对所有数字A，B，C和D进行此操作，然后将它们乘以多项式。一旦获得了由此产生的多项式，您就可以从中重建数字。

虽然 signed long long int不会容纳 A*B，但其中两个会。因此，A*B可以分解为不同指数的树术语，其中任何一个都安装一个signed long long int。

A1=A>>32;
A0=A & 0xffffffff;
B1=B>>32;
B0=B & 0xffffffff;
AB_0=A0*B0;
AB_1=A0*B1+A1*B0;
AB_2=A1*B1;

C*D。

相同

直接地绕开，可以对每对AB_i和CD_i进行subration，同样，使用一个附加的随身携带位（准确地是1位整数）。因此，如果我们说e = a*b-c*d您会得到类似的东西：

E_00=AB_0-CD_0 
E_01=(AB_0 > CD_0) == (AB_0 - CD_0 < 0) ? 0 : 1  // carry bit if overflow
E_10=AB_1-CD_1 
...

我们继续将E_10的上半部转移到E_20（移动32并添加，然后擦除E_10的上半部分）。

现在，您可以通过将其添加到正确的符号（从非携带部分获得）中的E_20来摆脱携带位E_11。如果这触发溢出，结果也不适合。

E_10现在有足够的"空间"从 E_00（换档，添加，擦除）和随身携带的 E_01。

E_10现在可能再次更大，因此我们重复转移到E_20。

此时，E_20必须变为零，否则结果不合适。E_10的上半部分也是由于转移而空的。

最后一步是再次将E_20的下半部分再次转移到E_10中。

如果E=A*B+C*D适合signed long long int的期望，我们现在有

E_20=0
E_10=0
E_00=E

如果您知道最终结果在整数类型中可表示，则可以使用下面的代码快速执行此计算。由于C标准指定未签名的算术是模量算术并且不会溢出，因此您可以使用无符号类型执行计算。

以下代码假设存在相同宽度的无符号类型，并且签名类型使用所有位模式来表示值（无陷阱表示，签名类型的最小值是未签名类型的模量的一半的负数）。如果在C实施中不存在，则可以对此进行简单调整。

。

以下使用signed char和unsigned char演示代码。对于您的实施，将Signed的定义更改为typedef signed long long int Signed;，将Unsigned的定义更改为typedef unsigned long long int Unsigned;。

#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

//  Define the signed and unsigned types we wish to use.
typedef signed char   Signed;
typedef unsigned char Unsigned;
//  uHalfModulus is half the modulus of the unsigned type.
static const Unsigned uHalfModulus = UCHAR_MAX/2+1;
//  sHalfModulus is the negation of half the modulus of the unsigned type.
static const Signed   sHalfModulus = -1 - (Signed) (UCHAR_MAX/2);

/*  Map the unsigned value to the signed value that is the same modulo the
    modulus of the unsigned type.  If the input x maps to a positive value, we
    simply return x.  If it maps to a negative value, we return x minus the
    modulus of the unsigned type.
    In most C implementations, this routine could simply be "return x;".
    However, this version uses several steps to convert x to a negative value
    so that overflow is avoided.
*/
static Signed ConvertToSigned(Unsigned x)
{
    /*  If x is representable in the signed type, return it.  (In some
        implementations, 
    */
    if (x < uHalfModulus)
        return x;
    /*  Otherwise, return x minus the modulus of the unsigned type, taking
        care not to overflow the signed type.
    */
    return (Signed) (x - uHalfModulus) - sHalfModulus;
}

/*  Calculate A*B - C*D given that the result is representable as a Signed
    value.
*/
static signed char Calculate(Signed A, Signed B, Signed C, Signed D)
{
    /*  Map signed values to unsigned values.  Positive values are unaltered.
        Negative values have the modulus of the unsigned type added.  Because
        we do modulo arithmetic below, adding the modulus does not change the
        final result.
    */
    Unsigned a = A;
    Unsigned b = B;
    Unsigned c = C;
    Unsigned d = D;
    //  Calculate with modulo arithmetic.
    Unsigned t = a*b - c*d;
    //  Map the unsigned value to the corresponding signed value.
    return ConvertToSigned(t);
}

int main()
{
    //  Test every combination of inputs for signed char.
    for (int A = SCHAR_MIN; A <= SCHAR_MAX; ++A)
    for (int B = SCHAR_MIN; B <= SCHAR_MAX; ++B)
    for (int C = SCHAR_MIN; C <= SCHAR_MAX; ++C)
    for (int D = SCHAR_MIN; D <= SCHAR_MAX; ++D)
    {
        //  Use int to calculate the expected result.
        int t0 = A*B - C*D;
        //  If the result is not representable in signed char, skip this case.
        if (t0 < SCHAR_MIN || SCHAR_MAX < t0)
            continue;
        //  Calculate the result with the sample code.
        int t1 = Calculate(A, B, C, D);
        //  Test the result for errors.
        if (t0 != t1)
        {
            printf("%d*%d - %d*%d = %d, but %d was returned.n",
                A, B, C, D, t0, t1);
            exit(EXIT_FAILURE);
        }
    }
    return 0;
}

您可以尝试将方程式分解为不会溢出的较小组件。

AB - CD
= [ A(B - N) - C( D - M )] + [AN - CM]
= ( AK - CJ ) + ( AN - CM)
    where K = B - N
          J = D - M

如果组件仍然溢出，则可以递归地将它们分解成较小的组件，然后重组。

i可能没有覆盖所有边缘情况，我也没有严格测试过这一点，但是这实现了我记得在80年代尝试在16位CPU上进行32位整数数学时使用的一种技术。本质上，您将32位分为两个16位单元，并分别与它们一起工作。

public class DoubleMaths {
  private static class SplitLong {
    // High half (or integral part).
    private final long h;
    // Low half.
    private final long l;
    // Split.
    private static final int SPLIT = (Long.SIZE / 2);
    // Make from an existing pair.
    private SplitLong(long h, long l) {
      // Let l overflow into h.
      this.h = h + (l >> SPLIT);
      this.l = l % (1l << SPLIT);
    }
    public SplitLong(long v) {
      h = v >> SPLIT;
      l = v % (1l << SPLIT);
    }
    public long longValue() {
      return (h << SPLIT) + l;
    }
    public SplitLong add ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() + b.longValue() );
    }
    public SplitLong sub ( SplitLong b ) {
      // TODO: Check for overflow.
      return new SplitLong ( longValue() - b.longValue() );
    }
    public SplitLong mul ( SplitLong b ) {
      /*
       * e.g. 10 * 15 = 150
       * 
       * Divide 10 and 15 by 5
       * 
       * 2 * 3 = 5
       * 
       * Must therefore multiply up by 5 * 5 = 25
       * 
       * 5 * 25 = 150
       */
      long lbl = l * b.l;
      long hbh = h * b.h;
      long lbh = l * b.h;
      long hbl = h * b.l;
      return new SplitLong ( lbh + hbl, lbl + hbh );
    }
    @Override
    public String toString () {
      return Long.toHexString(h)+"|"+Long.toHexString(l);
    }
  }
  // I'll use long and int but this can apply just as easily to long-long and long.
  // The aim is to calculate A*B - C*D without overflow.
  static final long A = Long.MAX_VALUE;
  static final long B = Long.MAX_VALUE - 1;
  static final long C = Long.MAX_VALUE;
  static final long D = Long.MAX_VALUE - 2;
  public static void main(String[] args) throws InterruptedException {
    // First do it with BigIntegers to get what the result should be.
    BigInteger a = BigInteger.valueOf(A);
    BigInteger b = BigInteger.valueOf(B);
    BigInteger c = BigInteger.valueOf(C);
    BigInteger d = BigInteger.valueOf(D);
    BigInteger answer = a.multiply(b).subtract(c.multiply(d));
    System.out.println("A*B - C*D = "+answer+" = "+answer.toString(16));
    // Make one and test its integrity.
    SplitLong sla = new SplitLong(A);
    System.out.println("A="+Long.toHexString(A)+" ("+sla.toString()+") = "+Long.toHexString(sla.longValue()));
    // Start small.
    SplitLong sl10 = new SplitLong(10);
    SplitLong sl15 = new SplitLong(15);
    SplitLong sl150 = sl10.mul(sl15);
    System.out.println("10="+sl10.longValue()+"("+sl10.toString()+") * 15="+sl15.longValue()+"("+sl15.toString()+") = "+sl150.longValue() + " ("+sl150.toString()+")");
    // The real thing.
    SplitLong slb = new SplitLong(B);
    SplitLong slc = new SplitLong(C);
    SplitLong sld = new SplitLong(D);
    System.out.println("B="+Long.toHexString(B)+" ("+slb.toString()+") = "+Long.toHexString(slb.longValue()));
    System.out.println("C="+Long.toHexString(C)+" ("+slc.toString()+") = "+Long.toHexString(slc.longValue()));
    System.out.println("D="+Long.toHexString(D)+" ("+sld.toString()+") = "+Long.toHexString(sld.longValue()));
    SplitLong sanswer = sla.mul(slb).sub(slc.mul(sld));
    System.out.println("A*B - C*D = "+sanswer+" = "+sanswer.longValue());
  }
}

打印：

A*B - C*D = 9223372036854775807 = 7fffffffffffffff
A=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
10=10(0|a) * 15=15(0|f) = 150 (0|96)
B=7ffffffffffffffe (7fffffff|fffffffe) = 7ffffffffffffffe
C=7fffffffffffffff (7fffffff|ffffffff) = 7fffffffffffffff
D=7ffffffffffffffd (7fffffff|fffffffd) = 7ffffffffffffffd
A*B - C*D = 7fffffff|ffffffff = 9223372036854775807

看上去像是在工作。

我敢打赌，我错过了一些细微之处，例如观察标志溢出等。但是我认为本质在那里。

for the目的是完整的，因为没有人提到它，一些编译器（例如GCC）实际上为您提供了一个128位整数。

因此，一个简单的解决方案可能是：

(long long)((__int128)A * B - (__int128)C * D)

AB-CD = (AB-CD) * AC / AC = (B/C-D/A)*A*C。B/C和D/A都无法溢出，因此首先计算(B/C-D/A)。由于最终结果不会根据您的定义溢出，因此您可以安全执行剩余的乘法并计算(B/C-D/A)*A*C这是所需的结果。

注意，如果您的输入也可以是也非常小，则B/C或D/A可以溢出。如果可能的话，根据输入检查可能需要更复杂的操作。

选择K = a big number（例如K = A - sqrt(A)）

A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D); // Avoid overflow.

为什么？

(A-K)*(B-K) = A*B - K*(A+B) + K^2
(C-K)*(D-K) = C*D - K*(C+D) + K^2
=>
(A-K)*(B-K) - (C-K)*(D-K) = A*B - K*(A+B) + K^2 - {C*D - K*(C+D) + K^2}
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B) + K*(C+D) + K^2 - K^2
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B-C-D)
=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A+B-C-D)
=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D)

请注意，由于A，B，C和D是大数字，因此A-C和B-D是小数。