The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

Yehuda Dar; Daniel LeJeune; Richard G. Baraniuk

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

Yehuda Dar, Daniel LeJeune, Richard G. Baraniuk

TL;DR

This paper analyzes transfer learning between two linear regression tasks in highly overparameterized regimes. It introduces an intuitive TL objective that regularizes the distance between target parameters and transferred source parameters, and it derives exact and asymptotic generalization expressions under orthonormal task-relations, showing TL can resolve the double-descent peak and often outperform optimally tuned ridge regression when source and target are sufficiently related. The authors also reveal that ignoring the true task relation (e.g., using $\widetilde{\mathbf{H}}=\mathbf{I}_d$) can improve generalization in some settings due to conditioning effects, and they formulate a linear MMSE (LMMSE) transfer-learning estimator that universally improves over the intuitive approach. The work further extends to misspecified models and general task-relations, providing rigorous results and highlighting the practical value of transfer learning as a regularizer and of LMMSE as a principled optimal linear strategy.

Abstract

We study a fundamental transfer learning process from source to target linear regression tasks, including overparameterized settings where there are more learned parameters than data samples. The target task learning is addressed by using its training data together with the parameters previously computed for the source task. We define a transfer learning approach to the target task as a linear regression optimization with a regularization on the distance between the to-be-learned target parameters and the already-learned source parameters. We analytically characterize the generalization performance of our transfer learning approach and demonstrate its ability to resolve the peak in generalization errors in double descent phenomena of the minimum L2-norm solution to linear regression. Moreover, we show that for sufficiently related tasks, the optimally tuned transfer learning approach can outperform the optimally tuned ridge regression method, even when the true parameter vector conforms to an isotropic Gaussian prior distribution. Namely, we demonstrate that transfer learning can beat the minimum mean square error (MMSE) solution of the independent target task. Our results emphasize the ability of transfer learning to extend the solution space to the target task and, by that, to have an improved MMSE solution. We formulate the linear MMSE solution to our transfer learning setting and point out its key differences from the common design philosophy to transfer learning.

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

TL;DR

Abstract

The Common Intuition to Transfer Learning Can Win or Lose: Case Studies for Linear Regression

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (8)