Table of Contents
Fetching ...

Noise Stability of Transformer Models

Themistoklis Haris, Zihan Zhang, Yuichi Yoshida

TL;DR

This paper tackles how to quantify simplicity bias in Transformers beyond Boolean average sensitivity by introducing noise stability, a measure tied to correlated input perturbations and real-valued domains via the Ornstein–Uhlenbeck framework. The authors develop theoretical results for single ReLU MLP layers and single attention layers, and extend to deep Transformers through a recurrence-based propagation and stability-interval analysis. They also propose a differentiable, data-dependent noise stability regularizer and demonstrate its effectiveness in catalyzing grokking and accelerating training on both synthetic tasks (Noisy $k$-Sparse Parity and Modular Addition) and next-token-prediction with WikiText-2, achieving roughly 35% and 75% improvements respectively. The findings connect signal propagation theory with interpretability, offering a practical tool for shaping robustness and generalization in modern Transformer models. The framework opens avenues for deeper theoretical understanding of moment propagation and the limits of noise-stability-based regularization in large-scale language models.

Abstract

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35\%$ and $75\%$ respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.

Noise Stability of Transformer Models

TL;DR

This paper tackles how to quantify simplicity bias in Transformers beyond Boolean average sensitivity by introducing noise stability, a measure tied to correlated input perturbations and real-valued domains via the Ornstein–Uhlenbeck framework. The authors develop theoretical results for single ReLU MLP layers and single attention layers, and extend to deep Transformers through a recurrence-based propagation and stability-interval analysis. They also propose a differentiable, data-dependent noise stability regularizer and demonstrate its effectiveness in catalyzing grokking and accelerating training on both synthetic tasks (Noisy -Sparse Parity and Modular Addition) and next-token-prediction with WikiText-2, achieving roughly 35% and 75% improvements respectively. The findings connect signal propagation theory with interpretability, offering a practical tool for shaping robustness and generalization in modern Transformer models. The framework opens avenues for deeper theoretical understanding of moment propagation and the limits of noise-stability-based regularization in large-scale language models.

Abstract

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately and respectively. Our results sculpt a new connection between signal propagation in neural networks and interpretability, with noise stability emerging as a powerful tool for understanding and improving modern Transformers.
Paper Structure (58 sections, 25 theorems, 172 equations, 14 figures, 1 table)

This paper contains 58 sections, 25 theorems, 172 equations, 14 figures, 1 table.

Key Result

Theorem 3.1

For every $\epsilon > 0$, there exists a $k$-junta $g: \{-1, 1\}^n \to \{-1, 1\}$ such that $\mathbb{P}[f(x) \neq g(x)] \le \epsilon$, where the number of variables $k$ on which $g$ depends is bounded by $k \le 2^{O(I(f)/\epsilon)}$.

Figures (14)

  • Figure 1: Comparing the per-coordinate geometric influence of three models for $n=256$.
  • Figure 2: Stability of Single Layer Attention (Identity and Unstructured)
  • Figure 3: Comparing $A_X$ and $A_Y$ for $d=128$ and $\rho = 0.01$.
  • Figure 4: Noise Stability Regularization accelerates training.
  • Figure 5: Noise Stability Regularization for Next-Token-Prediction (NTK) on WikiText-2
  • ...and 9 more figures

Theorems & Definitions (48)

  • Theorem 3.1
  • Definition 4.1
  • Lemma 1: Spectral Concentration via Stability
  • proof : Proof Sketch
  • Theorem 5.1
  • proof
  • Theorem 5.2
  • proof : Proof Sketch
  • Theorem 5.3
  • Definition 6.1: Noise Stability Regularization
  • ...and 38 more