Table of Contents
Fetching ...

Outlier Dimensions Encode Task-Specific Knowledge

William Rudman, Catherine Chen, Carsten Eickhoff

TL;DR

The paper investigates why a small set of outlier dimensions with variance far exceeding the mean dominate LLM embeddings and how fine-tuning affects these dimensions. It demonstrates that pre-training outlier dimensions persist after fine-tuning across tasks and, in some models, a single dimension (the principal outlier ρ) can encode enough task-specific knowledge to perform downstream classification via a simple linear threshold. Through activation diagrams and a brute-force 1-D analysis, the authors reveal a robust correlation between dimension variance and 1-D task performance, while also showing that non-principal outliers can achieve high accuracy and occasionally outperform full representations. These findings challenge the view that outlier dimensions are universally detrimental and highlight the relevance of low-dimensional subspaces for transfer learning and interpretability, with implications for efficient representation and model design.

Abstract

Representations from large language models (LLMs) are known to be dominated by a small subset of dimensions with exceedingly high variance. Previous works have argued that although ablating these outlier dimensions in LLM representations hurts downstream performance, outlier dimensions are detrimental to the representational quality of embeddings. In this study, we investigate how fine-tuning impacts outlier dimensions and show that 1) outlier dimensions that occur in pre-training persist in fine-tuned models and 2) a single outlier dimension can complete downstream tasks with a minimal error rate. Our results suggest that outlier dimensions can encode crucial task-specific knowledge and that the value of a representation in a single outlier dimension drives downstream model decisions.

Outlier Dimensions Encode Task-Specific Knowledge

TL;DR

The paper investigates why a small set of outlier dimensions with variance far exceeding the mean dominate LLM embeddings and how fine-tuning affects these dimensions. It demonstrates that pre-training outlier dimensions persist after fine-tuning across tasks and, in some models, a single dimension (the principal outlier ρ) can encode enough task-specific knowledge to perform downstream classification via a simple linear threshold. Through activation diagrams and a brute-force 1-D analysis, the authors reveal a robust correlation between dimension variance and 1-D task performance, while also showing that non-principal outliers can achieve high accuracy and occasionally outperform full representations. These findings challenge the view that outlier dimensions are universally detrimental and highlight the relevance of low-dimensional subspaces for transfer learning and interpretability, with implications for efficient representation and model design.

Abstract

Representations from large language models (LLMs) are known to be dominated by a small subset of dimensions with exceedingly high variance. Previous works have argued that although ablating these outlier dimensions in LLM representations hurts downstream performance, outlier dimensions are detrimental to the representational quality of embeddings. In this study, we investigate how fine-tuning impacts outlier dimensions and show that 1) outlier dimensions that occur in pre-training persist in fine-tuned models and 2) a single outlier dimension can complete downstream tasks with a minimal error rate. Our results suggest that outlier dimensions can encode crucial task-specific knowledge and that the value of a representation in a single outlier dimension drives downstream model decisions.
Paper Structure (24 sections, 2 equations, 5 figures, 4 tables)

This paper contains 24 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Average activation diagrams of sentence embeddings on the SST-2 validation dataset. The x-axis represents the index of the dimension, and the y-axis is the corresponding magnitude in that given dimension. Top: pre-trained models where no fine-tuning occurs. Bottom: models fine-tuned to complete SST-2.
  • Figure 2: Frequency of the 7 most commonly occurring outlier dimensions across all fine-tuning tasks and all random seeds. The x-axis plots the dimension frequency, and the y-axis plots the dimension index.
  • Figure 3: Comparing the downstream performance of all 1-D subspaces of sentence embedding activations on QNLI against the variance in that given dimension. Results for all tasks are available in Section \ref{['app:all_1d']} in the Appendix. The red dashed line indicates the threshold for whether a dimension qualifies as an "outlier dimension" (i.e., 5x the average variance in vector space).
  • Figure 4: Average activation diagrams of sentence embeddings on the SST-2 validation dataset. The x-axis represents the index of the dimension, and the y-axis is the corresponding magnitude in that given dimension. Top: pre-trained models where no fine-tuning occurs. Bottom: models fine-tuned to complete SST-2.
  • Figure 5: Comparing the downstream performance of all 1-D subspace of sentence embedding activations on SST-2, RTE, MRPC, and QQP against the variance in that given dimension. Results for all tasks are available in Section \ref{['app:all_1d']} in the Appendix. The red dashed line indicates the threshold for whether a dimension qualifies as an "outlier dimension" (i.e. 5x the average variance in vector space).