Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

Jean-Michel Muller; Bruno Salvy

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

Jean-Michel Muller, Bruno Salvy

TL;DR

This work develops generic, analytic quadratic error bounds for floating-point algorithms computing basic functions, notably the hypotenuse. By recasting error propagation as a polynomial optimization problem and exploiting the discrete structure of FP numbers, the authors produce bounds of the form $\alpha u+\beta u^2$ that hold for all $u$ up to a specified maximum, often achieving asymptotically optimal linear terms and tight quadratic terms. The approach combines step-by-step FP analysis, gradient-based optimization on triangular polynomial systems, sign tests, and regular chains, implemented in a Maple prototype to analyze several hypotenuse algorithms (NaiveHypot, simplest-scaling, Beebe-alg, Borges-fused, and Kahan). Results demonstrate bounds tighter than existing tools (Gappa, Satire) for small, building-block programs, with explicit proofs and runnable Maple worksheets, highlighting the potential for computer-aided proofs in numerical analysis. The work advances reliable, reusable error analysis for low-precision FP formats, with practical impact on algorithm selection, software correctness, and hardware-aware numerical computing.

Abstract

We provide tools to help automate the error analysis of algorithms that evaluate simple functions over the floating-point numbers. The aim is to obtain tight relative error bounds for these algorithms, expressed as a function of the unit round-off. Due to the discrete nature of the set of floating-point numbers, the largest errors are often intrinsically "arithmetic" in the sense that their appearance may depend on specific bit patterns in the binary representations of intermediate variables, which may be present only for some precisions. We focus on generic (i.e., parameterized by the precision) and analytic over-estimations that still capture the correlations between the errors made at each step of the algorithms. Using methods from computer algebra, which we adapt to the particular structure of the polynomial systems that encode the errors, we obtain bounds with a linear term in the unit round-off that is sharp in manycases. An explicit quadratic bound is given, rather than the $O()$-estimate that is more common in this area. This is particularly important when using low precision formats, which are increasingly common in modern processors. Using this approach, we compare five algorithms for computing the hypotenuse function, ranging from elementary to quite challenging.

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

TL;DR

that hold for all

up to a specified maximum, often achieving asymptotically optimal linear terms and tight quadratic terms. The approach combines step-by-step FP analysis, gradient-based optimization on triangular polynomial systems, sign tests, and regular chains, implemented in a Maple prototype to analyze several hypotenuse algorithms (NaiveHypot, simplest-scaling, Beebe-alg, Borges-fused, and Kahan). Results demonstrate bounds tighter than existing tools (Gappa, Satire) for small, building-block programs, with explicit proofs and runnable Maple worksheets, highlighting the potential for computer-aided proofs in numerical analysis. The work advances reliable, reusable error analysis for low-precision FP formats, with practical impact on algorithm selection, software correctness, and hardware-aware numerical computing.

Abstract

-estimate that is more common in this area. This is particularly important when using low precision formats, which are increasingly common in modern processors. Using this approach, we compare five algorithms for computing the hypotenuse function, ranging from elementary to quite challenging.

Paper Structure (86 sections, 17 theorems, 153 equations, 1 figure, 4 tables, 5 algorithms)

This paper contains 86 sections, 17 theorems, 153 equations, 1 figure, 4 tables, 5 algorithms.

Introduction
Motivation
Basics of Floating-Point Arithmetic
Which kind of error bounds?
Generic and analytic bounds
Genericity versus optimality
Recent results
Recent results in computer arithmetic
Recent work on automatic analysis
Contribution
Running example: the hypotenuse function
Main tool: Polynomial Optimization
Prototype Implementation
Structure of the article
Acknowledgements
...and 71 more sections

Key Result

Lemma 2.1

If $a$ and $b$ are floating-point numbers satisfying $a/2 \leq b \leq 2a$ then $b-a$ is a floating-point number, which implies $\textnormal{RN}(b-a) = b-a$.

Figures (1)

Figure 1: Left: absolute error (in ulps) of rounding to nearest $x \in [\frac{1}{2},16]$. Right: relative error (in multiples of $u=2^{-p}$) of rounding to nearest $x \in [\frac{1}{2},16]$. Both pictures assume a binary floating-point system with $p=5$.

Theorems & Definitions (27)

Lemma 2.1: Sterbenz' Lemma Ste74
Lemma 2.2: Exact representation of the square root remainder BolDau03a
Lemma 2.3: Dekker-Knuth's bound Knu98
Lemma 2.4: Jeannerod-Rump bounds JeannerodRump2018
Example 3.1
Theorem 5.1
Theorem 6.1
Proposition 6.2
proof
Theorem 7.1
...and 17 more

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

TL;DR

Abstract

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (27)