Here is the doc for the important idea about the
floating points powered by oracle
doc. website
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
Abstract:
Floating-point arithmetic is considered an esoteric subject by many people. This is rather surprising because floating-point is ubiquitous in computer systems. Almost every language has a floating-point
datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such
as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE
floating-point standard, and concludes with numerous examples of how computer builders can better support floating-point.
Categories and Subject Descriptors: (Primary) C.0 [Computer Systems Organization]: General --
instruction set design; D.3.4 [Programming Languages]: Processors --
compilers, optimization; G.1.0 [Numerical Analysis]: General -- computer arithmetic, error analysis, numerical algorithms (Secondary).
D.2.1 [Software Engineering]: Requirements/Specifications --
languages; D.3.4 Programming Languages]: Formal Definitions and Theory --
semantics; D.4.1 Operating Systems]: Process Management -- synchronization.
General Terms: Algorithms, Design, Languages
Additional Key Words and Phrases: Denormalized number, exception, floating-point, floating-point standard, gradual underflow, guard digit, NaN, overflow, relative error, rounding error, rounding
mode, ulp, underflow.
Introduction
Builders of computer systems often need information about floating-point arithmetic. There are, however, remarkably few sources of detailed information about it. One of the few books on the subject,
Floating-Point Computation by Pat Sterbenz, is long out of print. This paper is a tutorial on those aspects of floating-point arithmetic (floating-point
hereafter) that have a direct connection to systems building. It consists of three loosely connected parts. The first section,
Rounding Error, discusses the implications of using different rounding strategies for the basic operations of addition, subtraction, multiplication and division. It also contains background information on the two methods of measuring rounding error, ulps
and relative
error
. The second part discusses the IEEE floating-point standard, which is becoming rapidly accepted by commercial hardware manufacturers. Included in the IEEE standard is the rounding method for basic operations. The
discussion of the standard draws on the material in the section Rounding Error. The third part discusses the connections between floating-point and the design of various aspects of computer systems. Topics include instruction set design, optimizing compilers and exception handling.
I have tried to avoid making statements about floating-point without also giving reasons why the statements are true, especially since the justifications involve
nothing more complicated than elementary calculus. Those explanations that are not central to the main argument have been grouped into a section called "The Details," so that they can be skipped if desired. In particular, the proofs of many of the theorems
appear in this section. The end of each proof is marked with the
z symbol. When a proof is not included, the
z appears immediately following the statement of the theorem.