# Overflow 2.5.9 For Mac

For the routines cublasgemmEx and cublasGemmEx, when the compute type is greater than the output type, the sum of the split chunks can potentially lead to some intermediate overflows thus producing a final resulting matrix with some overflows. Those overflows might not have occurred if all the dot products had been accumulated in the compute type before being converted at the end in the output type. This computation side-effect can be easily exposed when the computeType is CUDA_R_32F and Atype, Btype and Ctype are in CUDA_R_16F. This behavior can be controlled using the compute precision mode CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION with cublasSetMathMode()

## Overflow 2.5.9 For Mac

This function computes the Euclidean norm of the vector x. The code uses a multiphase model of accumulation to avoid intermediate underflow and overflow, with the result being equivalent to \(\sqrt\sum_i = 1^n\left( \mathbfx\lbrack j\rbrack \times \mathbfx\lbrack j\rbrack \right)\) where \(j = 1 + \left( i - 1 \right)*\textincx\) in exact arithmetic. Notice that the last equation reflects 1-based indexing used for compatibility with Fortran.

Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point.

Additional Key Words and Phrases: Denormalized number, exception, floating-point, floating-point standard, gradual underflow, guard digit, NaN, overflow, relative error, rounding error, rounding mode, ulp, underflow.

Referring to TABLE D-1, single precision has emax = 127 and emin = -126. The reason for having emin

Just as NaNs provide a way to continue a computation when expressions like 0/0 or are encountered, infinities provide a way to continue when an overflow occurs. This is much safer than simply returning the largest representable number. As an example, consider computing , when = 10, p = 3, and emax = 98. If x = 3 1070 and y = 4 1070, then x2 will overflow, and be replaced by 9.99 1098. Similarly y2, and x2 + y2 will each overflow in turn, and be replaced by 9.99 1098. So the final result will be , which is drastically wrong: the correct answer is 5 1070. In IEEE arithmetic, the result of x2 is , as is y2, x2 + y2 and . So the final result is , which is safer than returning an ordinary floating-point number that is nowhere near the correct answer.17

When a subexpression evaluates to a NaN, the value of the entire expression is also a NaN. In the case of however, the value of the expression might be an ordinary floating-point number because of rules like 1/ = 0. Here is a practical example that makes use of the rules for infinity arithmetic. Consider computing the function x/(x2 + 1). This is a bad formula, because not only will it overflow when x is larger than , but infinity arithmetic will give the wrong answer because it will yield 0, rather than a number near 1/x. However, x/(x2 + 1) can be rewritten as 1/(x + x-1). This improved expression will not overflow prematurely and because of infinity arithmetic will have the correct value when x = 0: 1/(0 + 0-1) = 1/(0 + ) = 1/ = 0. Without infinity arithmetic, the expression 1/(x + x-1) requires a test for x = 0, which not only adds extra instructions, but may also disrupt a pipeline. This example illustrates a general fact, namely that infinity arithmetic often avoids the need for special case checking; however, formulas need to be carefully inspected to make sure they do not have spurious behavior at infinity (as x/(x2 + 1) did).

Zero is represented by the exponent emin - 1 and a zero significand. Since the sign bit can take on two different values, there are two zeros, +0 and -0. If a distinction were made when comparing +0 and -0, simple tests like if (x = 0) would have very unpredictable behavior, depending on the sign of x. Thus the IEEE standard defines comparison so that +0 = -0, rather than -0

suffers from the problem that if either component of the denominator c + id is larger than , the formula will overflow, even though the final result may be well within range. A better method of computing the quotients is to use Smith's formula:

When an exceptional condition like division by zero or overflow occurs in IEEE arithmetic, the default is to deliver a result and continue. Typical of the default results are NaN for 0/0 and , and for 1/0 and overflow. The preceding sections gave examples where proceeding from an exception with these default values was the reasonable thing to do. When any exception occurs, a status flag is also set. Implementations of the IEEE standard are required to provide users with a way to read and write the status flags. The flags are "sticky" in that once set, they remain set until explicitly cleared. Testing the flags is the only way to distinguish 1/0, which is a genuine infinity from an overflow.

The IEEE standard divides exceptions into 5 classes: overflow, underflow, division by zero, invalid operation and inexact. There is a separate status flag for each class of exception. The meaning of the first three exceptions is self-evident. Invalid operation covers the situations listed in TABLE D-3, and any comparison that involves a NaN. The default result of an operation that causes an invalid exception is to return a NaN, but the converse is not true. When one of the operands to an operation is a NaN, the result is a NaN but no invalid exception is raised unless the operation also satisfies one of the conditions in TABLE D-3.20 TABLE D-4 Exceptions in IEEE 754* Exception Result when traps disabled Argument to trap handler overflow or xmax round(x2-) underflow 0, or denormal round(x2) divide by zero operands invalid NaN operands inexact round(x) round(x)

There is a more interesting use for trap handlers that comes up when computing products such as that could potentially overflow. One solution is to use logarithms, and compute exp instead. The problem with this approach is that it is less accurate, and that it costs more than the simple expression , even if there is no overflow. There is another solution using trap handlers called over/underflow counting that avoids both of these problems [Sterbenz 1974].

The idea is as follows. There is a global counter initialized to zero. Whenever the partial product overflows for some k, the trap handler increments the counter by one and returns the overflowed quantity with the exponent wrapped around. In IEEE 754 single precision, emax = 127, so if pk = 1.45 2130, it will overflow and cause the trap handler to be called, which will wrap the exponent back into range, changing pk to 1.45 2-62 (see below). Similarly, if pk underflows, the counter would be decremented, and negative exponent would get wrapped around into a positive one. When all the multiplications are done, if the counter is zero then the final product is pn. If the counter is positive, the product overflowed, if the counter is negative, it underflowed. If none of the partial products are out of range, the trap handler is never called and the computation incurs no extra cost. Even if there are over/underflows, the calculation is more accurate than if it had been computed with logarithms, because each pk was computed from pk - 1 using a full precision multiply. Barnett [1987] discusses a formula where the full accuracy of over/underflow counting turned up an error in earlier tables of that formula.

IEEE 754 specifies that when an overflow or underflow trap handler is called, it is passed the wrapped-around result as an argument. The definition of wrapped-around for overflow is that the result is computed as if to infinite precision, then divided by 2, and then rounded to the relevant precision. For underflow, the result is multiplied by 2. The exponent is 192 for single precision and 1536 for double precision. This is why 1.45 x 2130 was transformed into 1.45 2-62 in the example above.

In the IEEE standard, rounding occurs whenever an operation has a result that is not exact, since (with the exception of binary decimal conversion) each operation is computed exactly and then rounded. By default, rounding means round toward nearest. The standard requires that three other rounding modes be provided, namely round toward 0, round toward +, and round toward -. When used with the convert to integer operation, round toward - causes the convert to become the floor function, while round toward + is ceiling. The rounding mode affects overflow, because when round toward 0 or round toward - is in effect, an overflow of positive magnitude causes the default result to be the largest representable number, not +. Similarly, overflows of negative magnitude will produce the largest negative number when round toward + or round toward 0 is in effect.