On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication
Tadisetty Sai Yashwanth
TL;DR
This work investigates floating-point non-determinism in GPU matrix multiplication and its dependence on batch context. By contrasting a single-input matmul with a batched version, the authors test the common i.i.d. Gaussian noise assumption and show that, while average perturbations are small, the induced noise is highly structured and correlated rather than independent. The paper formalizes two hypotheses, derives analytical flip-probability baselines, and provides a covariance-based analysis that reveals substantial off-diagonal noise energy (e.g., up to 47.22% for float16). These findings challenge the prevailing view of deterministic inference under hardware non-determinism and motivate structured robustness approaches for reliable large-scale models.
Abstract
Floating-point non-associativity makes fundamental deep learning operations, such as matrix multiplication (matmul) on GPUs, inherently non-deterministic. Despite this, the statistical structure of the resulting numerical error remains poorly understood. A common working assumption is that these errors behave as independent and identically distributed (i.i.d.) Gaussian noise. In this paper, we empirically test this assumption and show that it fails to describe real GPU behavior. By comparing outputs of single-input and batched matmuls, we find that while the i.i.d. model predicts non-zero output instability, empirical results show a 0.00% prediction flip rate. Through covariance analysis, we uncover the cause: the floating-point error is structured and highly correlated. For float16, nearly 50% of the total error variance lies in off-diagonal terms, revealing that the noise behaves as a coordinated, directional perturbation rather than random static. This result challenges the prevailing stochastic view of numerical noise and provides a principled foundation for analyzing deep learning reliability under hardware non-determinism.
