Dynamics of Meta-learning Representation in the Teacher-student Scenario
Hui Wang, Cho Tung Yip, Bo Li
TL;DR
The paper addresses the theoretical dynamics of gradient-based meta-learning in nonlinear two-layer networks under streaming tasks, seeking to explain the emergence of a shared meta-representation. It employs a statistical-physics framework to derive macroscopic order-parameter dynamics that track overlaps between meta-learner and meta-teacher representations, and to quantify meta-generalization via a derived set of ODEs. Key findings include a symmetry-breaking, specialization path where meta-learner units align with distinct meta-teacher units, the critical role of learning rates and overparameterization, and robustness to some variability in task mappings. The work provides a principled lens to study meta-learning behavior and offers guidance for hyperparameter choices and model design, with potential extensions to other activations and regularization schemes.
Abstract
Gradient-based meta-learning algorithms have gained popularity for their ability to train models on new tasks using limited data. Empirical observations indicate that such algorithms are able to learn a shared representation across tasks, which is regarded as a key factor in their success. However, the in-depth theoretical understanding of the learning dynamics and the origin of the shared representation remains underdeveloped. In this work, we investigate the meta-learning dynamics of nonlinear two-layer neural networks trained on streaming tasks in the teacher-student scenario. Through the lens of statistical physics analysis, we characterize the macroscopic behavior of the meta-training processes, the formation of the shared representation, and the generalization ability of the model on new tasks. The analysis also points to the importance of the choice of certain hyperparameters of the learning algorithms.
