|
NAMEieee —
IEEE standard 754 for floating-point arithmetic
DESCRIPTIONThe IEEE Standard 754 for Binary Floating-Point Arithmetic defines representations of floating-point numbers and abstract properties of arithmetic operations relating to precision, rounding, and exceptional cases, as described below.IEEE STANDARD 754 Floating-Point ArithmeticRadix: Binary.Overflow and underflow: Overflow goes by default to a signed infinity.
Underflow is gradual.
Zero is represented ambiguously as +0 or -0. Its sign transforms correctly through multiplication
or division, and is preserved by addition of zeros with like signs; but x-x
yields +0 for every finite x. The only operations that reveal zero's sign are
division by zero and
copysign (x,
±0). In particular, comparison (x > y, x
≥ y, etc.) cannot be affected by the sign of zero; but if finite x = y
then infinity = 1/(x-y) ≠ -1/(y-x) = -infinity.Infinity is signed. It persists when added to itself or to any finite
number. Its sign transforms correctly through multiplication and division, and
(finite)/±infinity = ±0 (nonzero)/0 =
±infinity. But infinity-infinity, infinity∗0 and
infinity/infinity are, like 0/0 and sqrt(-3), invalid operations that produce
NaN. ...
Reserved operands (NaNs): An NaN is (Not a
Number). Some NaNs, called Signaling NaNs, trap any floating-point
operation performed upon them; they are used to mark missing or uninitialized
values, or nonexistent elements of arrays. The rest are Quiet NaNs; they are
the default results of Invalid Operations, and propagate through subsequent
arithmetic operations. If x ≠ x then x is NaN; every other predicate (x
> y, x = y, x < y, ...) is FALSE if NaN is involved.
Rounding: Every algebraic operation (+, -, ∗, /,
√) is rounded by default to within half an ulp, and
when the rounding error is exactly half an ulp then the
rounded value's least significant bit is zero. (An ulp is
one Unit in the Last
Place.) This kind of rounding is usually the best kind,
sometimes provably so; for instance, for every x = 1.0, 2.0, 3.0, 4.0, ...,
2.0**52, we find (x/3.0)∗3.0 == x and (x/10.0)∗10.0 == x and ...
despite that both the quotients and the products have been rounded. Only
rounding like IEEE 754 can do that. But no single kind of rounding can be
proved best for every circumstance, so IEEE 754 provides rounding towards zero
or towards +infinity or towards -infinity at the programmer's option.
Exceptions: IEEE 754 recognizes five kinds of floating-point
exceptions, listed below in declining order of probable importance.
NOTE: An Exception is not an Error unless handled badly. What makes a class of exceptions exceptional is that no single default response can be satisfactory in every instance. On the other hand, if a default response will serve most instances satisfactorily, the unsatisfactory instances cannot justify aborting computation every time the exception occurs. Data FormatsSingle-precision:Type name: float
Wordsize: 32 bits. Precision: 24 significant bits, roughly like 7 significant decimals. If x and x' are consecutive positive single-precision numbers (they differ by 1 ulp), then
Underflowed results round to the nearest integer multiple of
Double-precision: Type name: double (On some
architectures, long double is the same as
double)
Wordsize: 64 bits. Precision: 53 significant bits, roughly like 16 significant decimals. If x and x' are consecutive positive double-precision numbers (they differ by 1 ulp), then
Underflowed results round to the nearest integer multiple of
Extended-precision: Type name: long double (when
supported by the hardware)
Wordsize: 96 bits. Precision: 64 significant bits, roughly like 19 significant decimals. If x and x' are consecutive positive extended-precision numbers (they differ by 1 ulp), then
Underflowed results round to the nearest integer multiple of
Quad-extended-precision: Type name: long double (when
supported by the hardware)
Wordsize: 128 bits. Precision: 113 significant bits, roughly like 34 significant decimals. If x and x' are consecutive positive quad-extended-precision numbers (they differ by 1 ulp), then
Underflowed results round to the nearest integer multiple of
Additional Information Regarding ExceptionsFor each kind of floating-point exception, IEEE 754 provides a Flag that is raised each time its exception is signaled, and stays raised until the program resets it. Programs may also test, save and restore a flag. Thus, IEEE 754 provides three ways by which programs may cope with exceptions for which the default result might be unsatisfactory:
At the option of an implementor conforming to IEEE 754, other ways to cope with exceptions may be provided:
Ideally, each elementary function should act as if it were indivisible, or atomic, in the sense that ...
The functions in
SEE ALSOfenv(3), ieee_test(3), math(3)An explanation of IEEE 754 and its proposed extension p854 was published in the IEEE magazine MICRO in August 1984 under the title "A Proposed Radix- and Word-length-independent Standard for Floating-point Arithmetic" by W. J. Cody et al. The manuals for Pascal, C and BASIC on the Apple Macintosh document the features of IEEE 754 pretty well. Articles in the IEEE magazine COMPUTER vol. 14 no. 3 (Mar. 1981), and in the ACM SIGNUM Newsletter Special Issue of Oct. 1979, may be helpful although they pertain to superseded drafts of the standard. STANDARDSIEEE Std 754-1985
Visit the GSP FreeBSD Man Page Interface. |