Foundation Model's Embedded Representations May Detect Distribution Shift

Max Vargas; Adam Tsou; Andrew Engel; Tony Chiang

Foundation Model's Embedded Representations May Detect Distribution Shift

Max Vargas, Adam Tsou, Andrew Engel, Tony Chiang

TL;DR

This work addresses distribution shifts between Sentiment140's automatically labeled training set $P$ and manually labeled test set $M$ in the context of transfer learning with foundation-model representations. It introduces a PCA-based, data-centric method to detect shifts in final-layer embeddings and compares training regimes—full fine-tuning on $P$ and linear probing on $M$—to assess generalization. Key findings show that many foundation-model embeddings separate $P$ and $M$, and fine-tuning on $P$ can degrade performance on $M$, while linear probes using pre-trained features on $M$ offer robust, data-efficient generalization. The study underscores the need to match train/test populations, advocates cautious pre-processing before TL, and suggests avenues for quantitative, architecture-aware analysis of distribution shifts.

Abstract

Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL on the Sentiment140 dataset and show that many pre-trained foundation models encode different representations of Sentiment140's manually curated test set $M$ from the automatically labeled training set $P$, confirming that a distribution shift has occurred. We argue training on $P$ and measuring performance on $M$ is a biased measure of generalization. Experiments on pre-trained GPT-2 show that the features learnable from $P$ do not improve (and in fact hamper) performance on $M$. Linear probes on pre-trained GPT-2's representations are robust and may even outperform overall fine-tuning, implying a fundamental importance for discerning distribution shift in train/test splits for model interpretation.

Foundation Model's Embedded Representations May Detect Distribution Shift

TL;DR

This work addresses distribution shifts between Sentiment140's automatically labeled training set

and manually labeled test set

in the context of transfer learning with foundation-model representations. It introduces a PCA-based, data-centric method to detect shifts in final-layer embeddings and compares training regimes—full fine-tuning on

and linear probing on

—to assess generalization. Key findings show that many foundation-model embeddings separate

and

, and fine-tuning on

can degrade performance on

, while linear probes using pre-trained features on

offer robust, data-efficient generalization. The study underscores the need to match train/test populations, advocates cautious pre-processing before TL, and suggests avenues for quantitative, architecture-aware analysis of distribution shifts.

Abstract

from the automatically labeled training set

, confirming that a distribution shift has occurred. We argue training on

and measuring performance on

is a biased measure of generalization. Experiments on pre-trained GPT-2 show that the features learnable from

do not improve (and in fact hamper) performance on

. Linear probes on pre-trained GPT-2's representations are robust and may even outperform overall fine-tuning, implying a fundamental importance for discerning distribution shift in train/test splits for model interpretation.

Paper Structure (25 sections, 8 figures, 5 tables)

This paper contains 25 sections, 8 figures, 5 tables.

Introduction
Related Work
Background and Methods
Results
LLMs can separate $P$ and $M$ with pre-trained weights.
Fine-tuning on datasets similar to $M$ does not generalize to $M$.
Discussion
Limitations and Future Work
Experimental Details
Uncertainty Estimates
Dataset and Model Sourcing
Pre-Processing Steps
Feature Extraction for Distribution Shift
Additional Experiments on Distributional Shift via Linear Classifiers
Additional Experiments on P and M
...and 10 more sections

Figures (8)

Figure 1: Kernel Density Estimates of the two largest principal components of the pre-trained final embedding representation of the automatically labeled training dataset $P$ and the manually curated testing dataset $M$ using various LLMs. Sub-figures are ordered in number of increasing model parameters.
Figure 2: Full-fine-tuning of pre-trained GPT-2, trained on a sample of P, evaluated on M
Figure 3: Linear probe of pre-trained GPT-2, trained on a sample of M, evaluated on remainder of M.
Figure 4: Linear Probe of Random Feature GPT-2, trained on a sample of M, evaluated on the remainder.
Figure 5: Training and Test Accuracy of a linear probe on the random features from GPT-2 architecture, trained on a held-out sample of 300 points from $M$, evaluated on the remainder.
...and 3 more figures

Foundation Model's Embedded Representations May Detect Distribution Shift

TL;DR

Abstract

Foundation Model's Embedded Representations May Detect Distribution Shift

Authors

TL;DR

Abstract

Table of Contents

Figures (8)