Noise Stability of Transformer Models

Themistoklis Haris; Zihan Zhang; Yuichi Yoshida

Noise Stability of Transformer Models

Themistoklis Haris, Zihan Zhang, Yuichi Yoshida

TL;DR

This paper tackles how to quantify simplicity bias in Transformers beyond Boolean average sensitivity by introducing noise stability, a measure tied to correlated input perturbations and real-valued domains via the Ornstein–Uhlenbeck framework. The authors develop theoretical results for single ReLU MLP layers and single attention layers, and extend to deep Transformers through a recurrence-based propagation and stability-interval analysis. They also propose a differentiable, data-dependent noise stability regularizer and demonstrate its effectiveness in catalyzing grokking and accelerating training on both synthetic tasks (Noisy $k$-Sparse Parity and Modular Addition) and next-token-prediction with WikiText-2, achieving roughly 35% and 75% improvements respectively. The findings connect signal propagation theory with interpretability, offering a practical tool for shaping robustness and generalization in modern Transformer models. The framework opens avenues for deeper theoretical understanding of moment propagation and the limits of noise-stability-based regularization in large-scale language models.

Abstract

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35\%$ and $75\%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.

Noise Stability of Transformer Models

TL;DR

-Sparse Parity and Modular Addition) and next-token-prediction with WikiText-2, achieving roughly 35% and 75% improvements respectively. The findings connect signal propagation theory with interpretability, offering a practical tool for shaping robustness and generalization in modern Transformer models. The framework opens avenues for deeper theoretical understanding of moment propagation and the limits of noise-stability-based regularization in large-scale language models.

Abstract

and

respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.

Paper Structure (58 sections, 25 theorems, 172 equations, 14 figures, 1 table)

This paper contains 58 sections, 25 theorems, 172 equations, 14 figures, 1 table.

Introduction
Our Contributions
Related Work
Simplicity Bias in Deep Learning.
Sensitivity Analysis in Transformers.
Signal Propagation in Neural Architectures
Generalizing Sensitivity to Continuous Domains.
Noise Stability and Sensitivity.
Setup
Boolean Function Analysis
Models are often "simpler" than expected
Theoretical Drawbacks: Extending to Real-Valued Domains
Empirical Drawbacks: Mismatch with LLM Behavior
Noise Stability as a Measure of Concentration
Spectral Concentration Bounds: Sensitivity vs. Stability
...and 43 more sections

Key Result

Theorem 3.1

For every $\epsilon > 0$, there exists a $k$-junta $g: \{-1, 1\}^n \to \{-1, 1\}$ such that $\mathbb{P}[f(x) \neq g(x)] \le \epsilon$, where the number of variables $k$ on which $g$ depends is bounded by $k \le 2^{O(I(f)/\epsilon)}$.

Figures (14)

Figure 1: Comparing the per-coordinate geometric influence of three models for $n=256$.
Figure 2: Stability of Single Layer Attention (Identity and Unstructured)
Figure 3: Comparing $A_X$ and $A_Y$ for $d=128$ and $\rho = 0.01$.
Figure 4: Noise Stability Regularization accelerates training.
Figure 5: Noise Stability Regularization for Next-Token-Prediction (NTK) on WikiText-2
...and 9 more figures

Theorems & Definitions (48)

Theorem 3.1
Definition 4.1
Lemma 1: Spectral Concentration via Stability
proof : Proof Sketch
Theorem 5.1
proof
Theorem 5.2
proof : Proof Sketch
Theorem 5.3
Definition 6.1: Noise Stability Regularization
...and 38 more

Noise Stability of Transformer Models

TL;DR

Abstract

Noise Stability of Transformer Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (48)