Attention Mechanisms Through the Lens of Numerical Methods: Approximation Methods and Alternative Formulations

Michel Fabrice Serret; Alice Cortinovis; Yijun Dong; Diana Halikias; Anna Ma; Fabio Matti; Deanna Needell; Katherine J. Pearce; Elizaveta Rebrova; Disha Shur; Rudi Smith; Hai-Xiao Wang; Laura Grigori

Attention Mechanisms Through the Lens of Numerical Methods: Approximation Methods and Alternative Formulations

Michel Fabrice Serret, Alice Cortinovis, Yijun Dong, Diana Halikias, Anna Ma, Fabio Matti, Deanna Needell, Katherine J. Pearce, Elizaveta Rebrova, Disha Shur, Rudi Smith, Hai-Xiao Wang, Laura Grigori

Abstract

The attention mechanism is the computational core of modern Transformer architectures, but its quadratic complexity in the input sequence length is the bottleneck for large-scale inference. This has motivated a rapidly growing body of work aimed at accelerating attention through approximation and reformulation. In this survey, we revisit attention mechanisms through the lens of numerical analysis, with a particular emphasis on tools and perspectives from numerical linear algebra. Our goal is twofold: first, we aim to systematically review and classify fast approximation methods according to the numerical principles they exploit. These include sparsity and clustering approaches, low-rank and subspace projection techniques, randomized sketching methods, and tensor-based decompositions. We also discuss kernel-inspired reformulations of attention and recent architectural variants, such as Latent Attention, that modify the standard softmax formulation to improve efficiency. Second, by presenting these developments within a unified mathematical framework, we aim to bridge the gap between disciplines and highlight opportunities for further contributions from computational mathematics, particularly numerical linear algebra, to the design of scalable attention mechanisms.

Attention Mechanisms Through the Lens of Numerical Methods: Approximation Methods and Alternative Formulations

Abstract

Attention Mechanisms Through the Lens of Numerical Methods: Approximation Methods and Alternative Formulations

Abstract

Paper Structure

Table of Contents

Figures (14)