Outlier Dimensions Encode Task-Specific Knowledge
William Rudman, Catherine Chen, Carsten Eickhoff
TL;DR
The paper investigates why a small set of outlier dimensions with variance far exceeding the mean dominate LLM embeddings and how fine-tuning affects these dimensions. It demonstrates that pre-training outlier dimensions persist after fine-tuning across tasks and, in some models, a single dimension (the principal outlier ρ) can encode enough task-specific knowledge to perform downstream classification via a simple linear threshold. Through activation diagrams and a brute-force 1-D analysis, the authors reveal a robust correlation between dimension variance and 1-D task performance, while also showing that non-principal outliers can achieve high accuracy and occasionally outperform full representations. These findings challenge the view that outlier dimensions are universally detrimental and highlight the relevance of low-dimensional subspaces for transfer learning and interpretability, with implications for efficient representation and model design.
Abstract
Representations from large language models (LLMs) are known to be dominated by a small subset of dimensions with exceedingly high variance. Previous works have argued that although ablating these outlier dimensions in LLM representations hurts downstream performance, outlier dimensions are detrimental to the representational quality of embeddings. In this study, we investigate how fine-tuning impacts outlier dimensions and show that 1) outlier dimensions that occur in pre-training persist in fine-tuned models and 2) a single outlier dimension can complete downstream tasks with a minimal error rate. Our results suggest that outlier dimensions can encode crucial task-specific knowledge and that the value of a representation in a single outlier dimension drives downstream model decisions.
