MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning
Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu, Aidong Zhang
TL;DR
This paper introduces MAML-en-LLM, a bi-level meta-learning framework that meta-trains LLMs via inner-task adaptation across multiple tasks and outer meta-updates with second-order gradients, aiming to produce true generalizable parameters for in-context learning. By sharing optimizer moments between inner and outer updates, it stabilizes the dual optimization and enables consolidated meta-training, yielding improved generalization to unseen domains and enhanced adaptation with limited data. Across two diverse datasets (CrossFit and UnifiedQA) and two model variants (standard and channel), MAML-en-LLM outperforms state-of-the-art MetaICL on a majority of settings and demonstrates strong few-shot adaptation capabilities. The work highlights the impact of task complexity, the number of exploration tasks, and optimizer choice on performance, motivating broader adoption of classical meta-learning techniques for LLM meta-training and in-context learning improvements.
Abstract
Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches essentially perform in-context multi-task fine-tuning and evaluate on a disjointed test set of tasks. Even though they achieve impressive performance, their goal is never to compute a truly general set of parameters. In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks. We see an average increase of 2% on unseen domains in the performance while a massive 4% improvement on adaptation performance. Furthermore, we demonstrate that MAML-en-LLM outperforms baselines in settings with limited amount of training data on both seen and unseen domains by an average of 2%. Finally, we discuss the effects of type of tasks, optimizers and task complexity, an avenue barely explored in meta-training literature. Exhaustive experiments across 7 task settings along with two data settings demonstrate that models trained with MAML-en-LLM outperform SOTA meta-training approaches.
