Evaluation of Bfloat16, Posit, and Takum Arithmetics in Sparse Linear Solvers
Laslo Hunhold, James Quinlan
TL;DR
This paper tackles the challenge of evaluating non-IEEE number formats in sparse linear solvers at realistic scales, aiming to understand numerical performance independent of algorithm-specific tailoring. It conducts a large-scale, format-agnostic assessment using SuiteSparse matrices and faithfully reproduces the UMFPACK LU and SPQR QR solvers for formats beyond single/double IEEE precision, while also exploring GMRES with ILU(0) and mixed-precision iterative refinement (MPIR). The study shows that tapered-precision formats, especially Takum arithmetic, offer superior stability and often higher accuracy than bfloat16, with many cases outperforming or matching posits, and introduces 8-bit posits and takums for MPIR. These findings suggest takums as a strong candidate to replace bfloat16 in high-performance sparse computations and highlight potential for robust, dynamic-range-insensitive mixed-precision workflows in scientific computing.
Abstract
Solving sparse linear systems lies at the core of numerous computational applications. Consequently, understanding the performance of recently proposed alternatives to the established IEEE 754 floating-point numbers, such as bfloat16 and the tapered-precision posit and takum machine number formats, is of significant interest. This paper examines these formats in the context of widely used solvers, namely LU, QR, and GMRES, with incomplete LU preconditioning and mixed precision iterative refinement (MPIR). This contrasts with the prevailing emphasis on designing specialized algorithms tailored to new arithmetic formats. This paper presents an extensive and unprecedented evaluation based on the SuiteSparse Matrix Collection -- a dataset of real-world matrices with diverse sizes and condition numbers. A key contribution is the faithful reproduction of SuiteSparse's UMFPACK multifrontal LU factorization and SPQR multifrontal QR factorization for machine number formats beyond single and double-precision IEEE 754. Tapered-precision posit and takum formats show better accuracy in direct solvers and reduced iteration counts in indirect solvers. Takum arithmetic, in particular, exhibits exceptional stability, even at low precision.
