Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers

Laslo Hunhold; James Quinlan

Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers

Laslo Hunhold, James Quinlan

TL;DR

This paper tackles the challenge of evaluating non-IEEE number formats in sparse linear solvers at realistic scales, aiming to understand numerical performance independent of algorithm-specific tailoring. It conducts a large-scale, format-agnostic assessment using SuiteSparse matrices and faithfully reproduces the UMFPACK LU and SPQR QR solvers for formats beyond single/double IEEE precision, while also exploring GMRES with ILU(0) and mixed-precision iterative refinement (MPIR). The study shows that tapered-precision formats, especially Takum arithmetic, offer superior stability and often higher accuracy than bfloat16, with many cases outperforming or matching posits, and introduces 8-bit posits and takums for MPIR. These findings suggest takums as a strong candidate to replace bfloat16 in high-performance sparse computations and highlight potential for robust, dynamic-range-insensitive mixed-precision workflows in scientific computing.

Abstract

Solving sparse linear systems lies at the core of numerous computational applications. Consequently, understanding the performance of recently proposed alternatives to the established IEEE 754 floating-point numbers, such as bfloat16 and the tapered-precision posit and takum machine number formats, is of significant interest. This paper examines these formats in the context of widely used solvers, namely LU, QR, and GMRES, with incomplete LU preconditioning and mixed precision iterative refinement (MPIR). This contrasts with the prevailing emphasis on designing specialized algorithms tailored to new arithmetic formats. This paper presents an extensive and unprecedented evaluation based on the SuiteSparse Matrix Collection -- a dataset of real-world matrices with diverse sizes and condition numbers. A key contribution is the faithful reproduction of SuiteSparse's UMFPACK multifrontal LU factorization and SPQR multifrontal QR factorization for machine number formats beyond single and double-precision IEEE 754. Tapered-precision posit and takum formats show better accuracy in direct solvers and reduced iteration counts in indirect solvers. Takum arithmetic, in particular, exhibits exceptional stability, even at low precision.

Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers

TL;DR

Abstract

Paper Structure (10 sections, 6 figures)

This paper contains 10 sections, 6 figures.

Introduction
Experimental Methods
Test Matrices Generation
Common Solver Experiment Interface
LU Solver
QR Solver
Mixed Precision Iterative Refinement (MPIR) Solver
Incomplete LU Preconditioned GMRES Solver
Results
Conclusion

Figures (6)

Figure 1: Dynamic range relative to the bit string length $n$ for linear takum, posit and a selection of floating-point formats.
Figure 2: Cumulative distribution of test matrix $L^1$ condition numbers.
Figure 3: Cumulative error distribution of the relative errors of the solutions of the linear systems via fully pivoted LU decomposition using a range of machine number types. The symbol $\infty_\sigma$ denotes where the conversion of the matrix to the target number type turned it singular, $\infty_\omega$ denotes where the dynamic range of the matrix entries exceeded the target number type.
Figure 4: Cumulative error distribution of the relative errors of the solutions of the linear systems via QR decomposition using a range of machine number types. The symbol $\infty_\sigma$ denotes where the conversion of the matrix to the target number type turned it singular, $\infty_\omega$ denotes where the dynamic range of the matrix entries exceeded the target number type.
Figure 5: Cumulative distribution of the MPIR iteration counts using a range of machine number types. The symbol $\infty_\sigma$ denotes where the initial low-precision LU decomposition yielded a singular system, $\infty_\omega$ denotes where the maximum iteration count was reached without the residual going below the desired relative tolerance.
...and 1 more figures

Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers

TL;DR

Abstract

Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers

Authors

TL;DR

Abstract

Table of Contents

Figures (6)