Near-Linear Runtime for a Classical Matrix Preconditioning Algorithm
Xufeng Cai, Jason M. Altschuler, Jelena Diakonikolas
TL;DR
The paper proves a near-linear runtime bound for Osborne's original matrix balancing algorithm by recasting it as exact cyclic coordinate descent on a convex objective ${\varphi}(\mathbf{u})$. With a descent lemma and an imbalance bound, it shows that Osborne's algorithm makes substantial progress over a full cycle, allowing a bound of ${\mathcal{O}}\big( \frac{\log \kappa}{\varepsilon} \min\{\frac{1}{\varepsilon}, d\} \big)$ cycles to achieve ${\varepsilon}$-balancing, and ${\mathcal{O}}\big( m \frac{\log \kappa}{\varepsilon} \min\{\frac{1}{\varepsilon}, d\} \big)$ arithmetic operations. The result is deterministic, near-optimal in the sparsity parameter ${m}$ up to a logarithm, and the first bound that does not overwhelm downstream eigenvalue computations. The analysis yields additional benefits, including stability under low-bit arithmetic, applicability to arbitrary or changing update orders, and extensions to random-reshuffle and parallel variants. Empirical illustrations reinforce that Osborne's cyclic approach remains competitive or superior in practice, providing a solid theoretical justification for its long-standing use as the default matrix-balancing preconditioner.
Abstract
In 1960, Osborne proposed a simple iterative algorithm for matrix balancing with outstanding numerical performance. Today, it is the default preconditioning procedure before eigenvalue computation and other linear algebra subroutines in mainstream software packages such as Python, Julia, MATLAB, EISPACK, LAPACK, and more. Despite its widespread usage, Osborne's algorithm has long resisted theoretical guarantees for its runtime: the first polynomial-time guarantees were obtained only in the past decade, and recent near-linear runtimes remain confined to variants of Osborne's algorithm with important differences that make them simpler to analyze but empirically slower. In this paper, we address this longstanding gap between theory and practice by proving that Osborne's original algorithm -- the de facto preconditioner in practice -- in fact has a near-linear runtime. This runtime guarantee (1) is optimal in the input size up to at most a single logarithm, (2) is the first runtime for Osborne's algorithm that does not dominate the runtime of downstream tasks like eigenvalue computation, and (3) improves upon the theoretical runtimes for all other variants of Osborne's algorithm.
