Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation
Yanhao Jin, Krishnakumar Balasubramanian, Debashis Paul
TL;DR
This work analyzes meta-learning under a high-dimensional multivariate random-effects model, showing how generalized ridge regression with a weight tied to the hyper-covariance $\Omega$ can improve generalization to unseen tasks. It establishes precise high-dimensional limits for predictive risk, proves optimality of using $\Omega^{-1}$ as the ridge weight, and develops a scalable geodesically convex method-of-moments estimator for $\Omega$ (with extensions to sparse settings). The proposed framework leverages random matrix theory to characterize the limiting risk and uses Riemannian optimization to efficiently estimate $\Omega$ without relying on non-convex MLE. Numerical experiments confirm the theoretical gains, demonstrating improved predictive performance on new tasks, particularly when hyper-covariance structure is accurately estimated or suitably regularized. The results provide a principled approach to meta-learning that exploits task similarities while remaining computationally feasible in high dimensions.
Abstract
Meta-learning involves training models on a variety of training tasks in a way that enables them to generalize well on new, unseen test tasks. In this work, we consider meta-learning within the framework of high-dimensional multivariate random-effects linear models and study generalized ridge-regression based predictions. The statistical intuition of using generalized ridge regression in this setting is that the covariance structure of the random regression coefficients could be leveraged to make better predictions on new tasks. Accordingly, we first characterize the precise asymptotic behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task. We next show that this predictive risk is optimal when the weight matrix in generalized ridge regression is chosen to be the inverse of the covariance matrix of random coefficients. Finally, we propose and analyze an estimator of the inverse covariance matrix of random regression coefficients based on data from the training tasks. As opposed to intractable MLE-type estimators, the proposed estimators could be computed efficiently as they could be obtained by solving (global) geodesically-convex optimization problems. Our analysis and methodology use tools from random matrix theory and Riemannian optimization. Simulation results demonstrate the improved generalization performance of the proposed method on new unseen test tasks within the considered framework.
