Table of Contents
Fetching ...

Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

Benjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, Richard Zemel

TL;DR

This work analyzes why ordinary least squares regression fails under covariate shift by linking OOD risk to the interaction of source and target eigenspectra and identifying Spectral Inflation as the core culprit. It introduces Spectral Adapted Regressor (SpAR), a lightweight, post-hoc method that uses unlabeled target data to project out spectral directions prone to inflation, yielding a regressor that adapts the last layer of pre-trained neural regressors. The authors derive a bias-variance decomposition per eigenvector, provide an alpha-based rule for selecting which spectral components to remove, and prove that the optimal projection improves OOD performance under the stated assumptions. Empirically, SpAR improves out-of-distribution performance across synthetic, tabular, image, and PovertyMap-WILDS datasets, often outperforming stronger baselines while remaining computationally efficient.

Abstract

Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-form solution for Ordinary Least Squares (OLS) regression is sensitive to covariate shift. We characterize the out-of-distribution risk of the OLS model in terms of the eigenspectrum decomposition of the source and target data. We then use this insight to propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.

Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

TL;DR

This work analyzes why ordinary least squares regression fails under covariate shift by linking OOD risk to the interaction of source and target eigenspectra and identifying Spectral Inflation as the core culprit. It introduces Spectral Adapted Regressor (SpAR), a lightweight, post-hoc method that uses unlabeled target data to project out spectral directions prone to inflation, yielding a regressor that adapts the last layer of pre-trained neural regressors. The authors derive a bias-variance decomposition per eigenvector, provide an alpha-based rule for selecting which spectral components to remove, and prove that the optimal projection improves OOD performance under the stated assumptions. Empirically, SpAR improves out-of-distribution performance across synthetic, tabular, image, and PovertyMap-WILDS datasets, often outperforming stronger baselines while remaining computationally efficient.

Abstract

Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-form solution for Ordinary Least Squares (OLS) regression is sensitive to covariate shift. We characterize the out-of-distribution risk of the OLS model in terms of the eigenspectrum decomposition of the source and target data. We then use this insight to propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
Paper Structure (29 sections, 6 theorems, 80 equations, 4 figures, 13 tables, 1 algorithm)

This paper contains 29 sections, 6 theorems, 80 equations, 4 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

Assuming the data generative procedure defined in Equations eq:x_data_generation, and that $w^* \in \mathrm{Span}(\mathrm{Rows}(X))$ and $\mathrm{Rows}(Z) \subset \mathrm{Span}(\mathrm{Rows}(X))$, the OOD squared error loss of the estimator $\hat{w} = X^{\dagger}Y$ is equal to:

Figures (4)

  • Figure 1: Ordinary Least Squares Regression under Covariate Shift. (a) Points are 2D input samples in the training set $X$ and test set $Z$. The in-distribution (ID) training data demonstrates nearly zero vertical variance, while the out-of-distribution (OOD) test data varies significantly in this direction. (b) Samples in $Z$ colored according to their true, noiseless labels $Zw^*$. (c) Samples in $Z$ colored according to their OLS predictions $Z\hat{w}$. Crucially, to minimize training risk, OLS learns to weigh the vertical component highly causing erroneous predictions OOD. (e) SpAR identifies a spectral subspace $S$ where train/test variance differ the most, and projects it out. Thus, the regressor created by SpAR ignores the direction with high variance and nearly recovers $w^*$.
  • Figure 2: Spectral Inflation. We use the PovertyMap-WILDS dataset koh2021wilds to investigate how input spectra change when a regressor trained on real-world data generalizes to (perhaps shifted) test data. $X$ and $Z$ are composed of representations from a DNN. $Z$ represents data either from an in-distribution or out-of-distribution test set. $\mathrm{Var}_{z,j}$, as defined in Equation \ref{['eq:individual_loss_contributions']}, measures the amount of Spectral Inflation---small amounts of training set variation becoming large at test time---occurring along a given test eigenvector. Because each test sample has a different number of examples $M$, we normalize for a fair comparison. We see that when $Z$ is an out-of-distribution sample, much more spectral inflation occurs than when we generalize to an in-distribution sample.
  • Figure 3: Tabular data. OOD RMSE for several methods, each averaged across 10 seeds.
  • Figure 4: Hyperparameter sensitivity SpAR performance as a function of $\alpha$ on tabular datasets.

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 3.1
  • Proposition 1
  • Lemma 1