Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences
Siquan Li, Yao Tong, Haonan Wang, Tianyang Hu
TL;DR
The paper reveals that randomly initialized Transformers are not neutral: they harbor strong, seed-specific biases that bias next-token predictions even before any training. It develops a mechanistic account in which inter-sequence contraction driven by asymmetric MLP activations and intra-sequence contraction via self-attention align representations along a seed-determined direction, producing pronounced top-token preferences. Crucially, these initialization-induced biases persist through training, enabling SeedPrint, a fingerprinting method capable of distinguishing models by birth seed even under distribution shifts. The authors further connect a positional variance discrepancy in attention to the attention-sink phenomenon and demonstrate practical architectural mitigations—variance calibration strategies that reduce sinks without sacrificing language modeling performance. Together, these results shift focus from what models learn to what they are born with, offering new tools for model attribution and stability control in large-scale LLMs.
Abstract
Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global (inter-sequence) representation concentration, and (ii) self-attention further amplifies this effect through local (intra-sequence) aggregation. Together, these mechanisms align hidden representations along a direction determined solely by the random initialization, producing highly non-uniform next-token predictions. Beyond mechanistic insight, we demonstrate that these initialization-induced biases persist throughout training, forming a stable and intrinsic model identity. Leveraging this property, we introduce SeedPrint, a fingerprinting method that can reliably distinguish models that differ only in their random initialization, even after extensive training and under substantial distribution shift. Finally, we identify a fundamental positional discrepancy inherent to the attention mechanism's intra-sequence contraction that is causally linked to the attention-sink phenomenon. This discovery provides a principled explanation for the emergence of sinks and offers a pathway for their control.
