Table of Contents
Fetching ...

Foundation Model's Embedded Representations May Detect Distribution Shift

Max Vargas, Adam Tsou, Andrew Engel, Tony Chiang

TL;DR

This work addresses distribution shifts between Sentiment140's automatically labeled training set $P$ and manually labeled test set $M$ in the context of transfer learning with foundation-model representations. It introduces a PCA-based, data-centric method to detect shifts in final-layer embeddings and compares training regimes—full fine-tuning on $P$ and linear probing on $M$—to assess generalization. Key findings show that many foundation-model embeddings separate $P$ and $M$, and fine-tuning on $P$ can degrade performance on $M$, while linear probes using pre-trained features on $M$ offer robust, data-efficient generalization. The study underscores the need to match train/test populations, advocates cautious pre-processing before TL, and suggests avenues for quantitative, architecture-aware analysis of distribution shifts.

Abstract

Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL on the Sentiment140 dataset and show that many pre-trained foundation models encode different representations of Sentiment140's manually curated test set $M$ from the automatically labeled training set $P$, confirming that a distribution shift has occurred. We argue training on $P$ and measuring performance on $M$ is a biased measure of generalization. Experiments on pre-trained GPT-2 show that the features learnable from $P$ do not improve (and in fact hamper) performance on $M$. Linear probes on pre-trained GPT-2's representations are robust and may even outperform overall fine-tuning, implying a fundamental importance for discerning distribution shift in train/test splits for model interpretation.

Foundation Model's Embedded Representations May Detect Distribution Shift

TL;DR

This work addresses distribution shifts between Sentiment140's automatically labeled training set and manually labeled test set in the context of transfer learning with foundation-model representations. It introduces a PCA-based, data-centric method to detect shifts in final-layer embeddings and compares training regimes—full fine-tuning on and linear probing on —to assess generalization. Key findings show that many foundation-model embeddings separate and , and fine-tuning on can degrade performance on , while linear probes using pre-trained features on offer robust, data-efficient generalization. The study underscores the need to match train/test populations, advocates cautious pre-processing before TL, and suggests avenues for quantitative, architecture-aware analysis of distribution shifts.

Abstract

Sampling biases can cause distribution shifts between train and test datasets for supervised learning tasks, obscuring our ability to understand the generalization capacity of a model. This is especially important considering the wide adoption of pre-trained foundational neural networks -- whose behavior remains poorly understood -- for transfer learning (TL) tasks. We present a case study for TL on the Sentiment140 dataset and show that many pre-trained foundation models encode different representations of Sentiment140's manually curated test set from the automatically labeled training set , confirming that a distribution shift has occurred. We argue training on and measuring performance on is a biased measure of generalization. Experiments on pre-trained GPT-2 show that the features learnable from do not improve (and in fact hamper) performance on . Linear probes on pre-trained GPT-2's representations are robust and may even outperform overall fine-tuning, implying a fundamental importance for discerning distribution shift in train/test splits for model interpretation.
Paper Structure (25 sections, 8 figures, 5 tables)

This paper contains 25 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Kernel Density Estimates of the two largest principal components of the pre-trained final embedding representation of the automatically labeled training dataset $P$ and the manually curated testing dataset $M$ using various LLMs. Sub-figures are ordered in number of increasing model parameters.
  • Figure 2: Full-fine-tuning of pre-trained GPT-2, trained on a sample of P, evaluated on M
  • Figure 3: Linear probe of pre-trained GPT-2, trained on a sample of M, evaluated on remainder of M.
  • Figure 4: Linear Probe of Random Feature GPT-2, trained on a sample of M, evaluated on the remainder.
  • Figure 5: Training and Test Accuracy of a linear probe on the random features from GPT-2 architecture, trained on a held-out sample of 300 points from $M$, evaluated on the remainder.
  • ...and 3 more figures