Meta-learning of shared linear representations beyond well-specified linear regression
Mathieu Even, Laurent Massoulié
TL;DR
This work extends meta-learning and multi-task learning by analyzing shared linear representations under general convex objectives, using mild assumptions on noise and Hessian concentration. It develops rank- and cluster-regularized estimators and derives sharp, high-probability generalization bounds that reveal how collaboration reduces sample complexity to roughly the order of $rd$ (up to log factors) for learning the shared structure. It also characterizes a single-sample-per-task regime, showing subspace recovery is possible only when the task count grows exponentially with the subspace dimension, and introduces a polynomial-time nuclear-norm relaxation that interpolates between no-collaboration and optimal non-convex rates. The framework thus supports few-shot learning and practical learning of shared representations in broad convex settings, including GLMs, beyond classical linear regression.
Abstract
Motivated by multi-task and meta-learning approaches, we consider the problem of learning structure shared by tasks or users, such as shared low-rank representations or clustered structures. While all previous works focus on well-specified linear regression, we consider more general convex objectives, where the structural low-rank and cluster assumptions are expressed on the optima of each function. We show that under mild assumptions such as \textit{Hessian concentration} and \textit{noise concentration at the optimum}, rank and clustered regularized estimators recover such structure, provided the number of samples per task and the number of tasks are large enough. We then study the problem of recovering the subspace in which all the solutions lie, in the setting where there is only a single sample per task: we show that in that case, the rank-constrained estimator can recover the subspace, but that the number of tasks needs to scale exponentially large with the dimension of the subspace. Finally, we provide a polynomial-time algorithm via nuclear norm constraints for learning a shared linear representation in the context of convex learning objectives.
