Minimum-Norm Interpolation Under Covariate Shift
Neil Mallinar, Austin Zane, Spencer Frei, Bin Yu
TL;DR
This work develops the first finite-sample, instance-wise excess-risk bounds for the minimum-norm interpolator under covariate shift in high-dimensional linear regression, focusing on sources that satisfy benign overfitting and assuming commuting source and target covariances. It decomposes risk into bias and variance components and introduces a taxonomy of covariate shifts— Beneficial and Malignant—driven by eigenvalue ratios and the degree of overparameterization, including mild and severe regimes. The main theoretical results are complemented by synthetic and real-data experiments (e.g., CIFAR-10/10C and neural networks) that validate the shift taxonomy and show that overparameterization can improve out-of-distribution robustness under certain shifts. The findings illuminate when and how interpolation can remain robust under distribution shifts and open directions to extend the theory beyond simultaneous diagonalizability and into nonlinear models, with practical implications for transfer learning in noisy, high-dimensional settings.
Abstract
Transfer learning is a critical part of real-world machine learning deployments and has been extensively studied in experimental works with overparameterized neural networks. However, even in the simplest setting of linear regression a notable gap still exists in the theoretical understanding of transfer learning. In-distribution research on high-dimensional linear regression has led to the identification of a phenomenon known as \textit{benign overfitting}, in which linear interpolators overfit to noisy training labels and yet still generalize well. This behavior occurs under specific conditions on the source covariance matrix and input data dimension. Therefore, it is natural to wonder how such high-dimensional linear models behave under transfer learning. We prove the first non-asymptotic excess risk bounds for benignly-overfit linear interpolators in the transfer learning setting. From our analysis, we propose a taxonomy of \textit{beneficial} and \textit{malignant} covariate shifts based on the degree of overparameterization. We follow our analysis with empirical studies that show these beneficial and malignant covariate shifts for linear interpolators on real image data, and for fully-connected neural networks in settings where the input data dimension is larger than the training sample size.
