Table of Contents
Fetching ...

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

Jean-Michel Muller, Bruno Salvy

TL;DR

This work develops generic, analytic quadratic error bounds for floating-point algorithms computing basic functions, notably the hypotenuse. By recasting error propagation as a polynomial optimization problem and exploiting the discrete structure of FP numbers, the authors produce bounds of the form $\alpha u+\beta u^2$ that hold for all $u$ up to a specified maximum, often achieving asymptotically optimal linear terms and tight quadratic terms. The approach combines step-by-step FP analysis, gradient-based optimization on triangular polynomial systems, sign tests, and regular chains, implemented in a Maple prototype to analyze several hypotenuse algorithms (NaiveHypot, simplest-scaling, Beebe-alg, Borges-fused, and Kahan). Results demonstrate bounds tighter than existing tools (Gappa, Satire) for small, building-block programs, with explicit proofs and runnable Maple worksheets, highlighting the potential for computer-aided proofs in numerical analysis. The work advances reliable, reusable error analysis for low-precision FP formats, with practical impact on algorithm selection, software correctness, and hardware-aware numerical computing.

Abstract

We provide tools to help automate the error analysis of algorithms that evaluate simple functions over the floating-point numbers. The aim is to obtain tight relative error bounds for these algorithms, expressed as a function of the unit round-off. Due to the discrete nature of the set of floating-point numbers, the largest errors are often intrinsically "arithmetic" in the sense that their appearance may depend on specific bit patterns in the binary representations of intermediate variables, which may be present only for some precisions. We focus on generic (i.e., parameterized by the precision) and analytic over-estimations that still capture the correlations between the errors made at each step of the algorithms. Using methods from computer algebra, which we adapt to the particular structure of the polynomial systems that encode the errors, we obtain bounds with a linear term in the unit round-off that is sharp in manycases. An explicit quadratic bound is given, rather than the $O()$-estimate that is more common in this area. This is particularly important when using low precision formats, which are increasingly common in modern processors. Using this approach, we compare five algorithms for computing the hypotenuse function, ranging from elementary to quite challenging.

Effective Quadratic Error Bounds for Floating-Point Algorithms Computing the Hypotenuse Function

TL;DR

This work develops generic, analytic quadratic error bounds for floating-point algorithms computing basic functions, notably the hypotenuse. By recasting error propagation as a polynomial optimization problem and exploiting the discrete structure of FP numbers, the authors produce bounds of the form that hold for all up to a specified maximum, often achieving asymptotically optimal linear terms and tight quadratic terms. The approach combines step-by-step FP analysis, gradient-based optimization on triangular polynomial systems, sign tests, and regular chains, implemented in a Maple prototype to analyze several hypotenuse algorithms (NaiveHypot, simplest-scaling, Beebe-alg, Borges-fused, and Kahan). Results demonstrate bounds tighter than existing tools (Gappa, Satire) for small, building-block programs, with explicit proofs and runnable Maple worksheets, highlighting the potential for computer-aided proofs in numerical analysis. The work advances reliable, reusable error analysis for low-precision FP formats, with practical impact on algorithm selection, software correctness, and hardware-aware numerical computing.

Abstract

We provide tools to help automate the error analysis of algorithms that evaluate simple functions over the floating-point numbers. The aim is to obtain tight relative error bounds for these algorithms, expressed as a function of the unit round-off. Due to the discrete nature of the set of floating-point numbers, the largest errors are often intrinsically "arithmetic" in the sense that their appearance may depend on specific bit patterns in the binary representations of intermediate variables, which may be present only for some precisions. We focus on generic (i.e., parameterized by the precision) and analytic over-estimations that still capture the correlations between the errors made at each step of the algorithms. Using methods from computer algebra, which we adapt to the particular structure of the polynomial systems that encode the errors, we obtain bounds with a linear term in the unit round-off that is sharp in manycases. An explicit quadratic bound is given, rather than the -estimate that is more common in this area. This is particularly important when using low precision formats, which are increasingly common in modern processors. Using this approach, we compare five algorithms for computing the hypotenuse function, ranging from elementary to quite challenging.
Paper Structure (86 sections, 17 theorems, 153 equations, 1 figure, 4 tables, 5 algorithms)

This paper contains 86 sections, 17 theorems, 153 equations, 1 figure, 4 tables, 5 algorithms.

Key Result

Lemma 2.1

If $a$ and $b$ are floating-point numbers satisfying $a/2 \leq b \leq 2a$ then $b-a$ is a floating-point number, which implies $\textnormal{RN}(b-a) = b-a$.

Figures (1)

  • Figure 1: Left: absolute error (in ulps) of rounding to nearest $x \in [\frac{1}{2},16]$. Right: relative error (in multiples of $u=2^{-p}$) of rounding to nearest $x \in [\frac{1}{2},16]$. Both pictures assume a binary floating-point system with $p=5$.

Theorems & Definitions (27)

  • Lemma 2.1: Sterbenz' Lemma Ste74
  • Lemma 2.2: Exact representation of the square root remainder BolDau03a
  • Lemma 2.3: Dekker-Knuth's bound Knu98
  • Lemma 2.4: Jeannerod-Rump bounds JeannerodRump2018
  • Example 3.1
  • Theorem 5.1
  • Theorem 6.1
  • Proposition 6.2
  • proof
  • Theorem 7.1
  • ...and 17 more