Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

Lucas M. Dutton; Christopher Kumar Anand; Robert Enenkel; Silvia Melitta Müller

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

Lucas M. Dutton, Christopher Kumar Anand, Robert Enenkel, Silvia Melitta Müller

TL;DR

This work addresses the challenge of achieving correctly rounded results for floating-point reciprocals, divisions, and square roots under IEEE-754, by introducing a residual-based Final Correction algorithm that can operate on imperfect, lower-precision estimates. Central to the approach are irrepresentability theorems proving that midpoints between representable FP numbers cannot occur as reciprocals, divisions, or square roots, allowing the correction to avoid equality tests and rely on simple ulp-based adjustments. The authors present a concrete, hardware-friendly procedure that uses a residual $r = 1 - ext{approx}(1/x) imes x$ and a correction term in ulps to yield a correctly rounded result with only additions and multiplications, along with several implementation variants and exhaustive single-precision testing. The contributions extend prior results by providing a more general and tolerant framework for correction, with potential impact on FPUs and co-processors, and outline future work on reciprocal square roots and mixed-precision scenarios. Overall, the work offers a pragmatic pathway to high-accuracy, efficiently implementable floating-point operations that adhere to IEEE-754 rounding semantics across modes.

Abstract

Floating-point arithmetic performance determines the overall performance of important applications, from graphics to AI. Meeting the IEEE-754 specification for floating-point requires that final results of addition, subtraction, multiplication, division, and square root are correctly rounded based on the user-selected rounding mode. A frustrating fact for implementers is that naive rounding methods will not produce correctly rounded results even when intermediate results with greater accuracy and precision are available. In contrast, our novel algorithm can correct approximations of reciprocal, division and square root, even ones with slightly lower than target precision. In this paper, we present a family of algorithms that can both increase the accuracy (and potentially the precision) of an estimate and correctly round it according to all binary IEEE-754 rounding modes. We explain how it may be efficiently implemented in hardware, and for completeness, we present proofs that it is not necessary to include equality tests associated with round-to-nearest-even mode for reciprocal, division and square root functions, because it is impossible for input(s) in a given precision to have exact answers exactly midway between representable floating-point numbers in that precision. In fact, our simpler proofs are sometimes stronger.

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

TL;DR

and a correction term in ulps to yield a correctly rounded result with only additions and multiplications, along with several implementation variants and exhaustive single-precision testing. The contributions extend prior results by providing a more general and tolerant framework for correction, with potential impact on FPUs and co-processors, and outline future work on reciprocal square roots and mixed-precision scenarios. Overall, the work offers a pragmatic pathway to high-accuracy, efficiently implementable floating-point operations that adhere to IEEE-754 rounding semantics across modes.

Abstract

Paper Structure (15 sections, 8 theorems, 15 equations, 4 figures, 1 table, 3 algorithms)

This paper contains 15 sections, 8 theorems, 15 equations, 4 figures, 1 table, 3 algorithms.

Introduction
Background
Floating-Point Numbers
Unit in Last Place
Rounding
Rounding Theorems
Representable Numbers
Reciprocals cannot be midpoints
Divisions cannot be midpoints
Theorems on Square Roots
Correction Algorithm
Simplifications
Testing
Prior Work
Conclusions and future work

Key Result

Theorem 1

Let $n>1$ be a floating-point precision and $A \in (2^{n-1}, 2^n)$ the numerator of the representable number $\frac{A}{2^{n-1}}$. There does not exist $m>1$ and $B \in (2^{m-1}, 2^m)$ such that $\frac{B}{2^{m}} = \frac{2^{n-1}}{A}$, i.e, the inverse of $\frac{A}{2^{n-1}}$.

Figures (4)

Figure 1: single-precision binary floating-point layout
Figure 2: Cases for rounding to nearest even
Figure 3: Number line of the ulp error and $\delta$
Figure 4: Circuit diagram of the Final Correction method for single-precision reciprocal

Theorems & Definitions (10)

Definition 1: Unit in Last Place
Definition 2
Theorem 1
corollary 1: Reciprocals cannot be midpoints
Theorem 2
lemma 1
lemma 2
lemma 3
lemma 4
Theorem 3

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

TL;DR

Abstract

Inexactness and Correction of Floating-Point Reciprocal, Division and Square Root

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)