Task Shift: From Classification to Regression in Overparameterized Linear Models
Tyler LaBonte, Kuo-Wei Lai, Vidya Muthukumar
TL;DR
This work investigates whether estimators trained on classification can generalize to regression in overparameterized linear models with Gaussian covariates, focusing on zero-shot and few-shot regimes. The authors leverage the minimum-norm interpolator framework and fine-grained parameter-level analysis under anisotropic covariance to show zero-shot task shift is generally impossible for both sparse and random signals, even under benign overfitting. They then introduce a simple, practical few-shot postprocessing method that first identifies the support of a sparse ground truth via attenuation patterns in the classification MNI and then runs a least-squares fit on that support using a small regression dataset, achieving a regression error of order $O\left(\frac{t}{m}\right)$ with $m$ regression samples. The results reveal a structured attenuation of classification signals that can be exploited for few-shot task shift and illuminate fundamental bias–task-shift tradeoffs in dense random-signal settings, with implications for understanding in-context learning and kernel/NTK regimes in high dimensions.
Abstract
Modern machine learning methods have recently demonstrated remarkable capability to generalize under task shift, where latent knowledge is transferred to a different, often more difficult, task under a similar data distribution. We investigate this phenomenon in an overparameterized linear regression setting where the task shifts from classification during training to regression during evaluation. In the zero-shot case, wherein no regression data is available, we prove that task shift is impossible in both sparse signal and random signal models for any Gaussian covariate distribution. In the few-shot case, wherein limited regression data is available, we propose a simple postprocessing algorithm which asymptotically recovers the ground-truth predictor. Our analysis leverages a fine-grained characterization of individual parameters arising from minimum-norm interpolation which may be of independent interest. Our results show that while minimum-norm interpolators for classification cannot transfer to regression a priori, they experience surprisingly structured attenuation which enables successful task shift with limited additional data.
