TEGEE: Task dEfinition Guided Expert Ensembling for Generalizable and Few-shot Learning
Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Xingyuan Bu, Lei Ma, Stephen W. Huang, Jiajun Zhang, Yinan Shi, Chenghua Lin, Jie Fu, Ge Zhang
TL;DR
TEGEE tackles the core bottleneck of in-context learning by explicitly extracting task definitions and guiding expert ensembling. It deploys a dual 3B-model design—one for task-definition extraction and one for learning from demonstrations—augmented with a LoRA-based dynamic expert pool and a retrieval-enforced weighting scheme. Empirical results on SuperNI show TEGEE achieving performance on par with LLaMA2-13B and outperforming 7B baselines, while enabling continual few-shot learning through continual pool augmentation. The work highlights the primacy of task-definition extraction in ICL and demonstrates how modular, task-aware ensembling can extend few-shot learning to many-shot regimes with practical continual learning benefits.
Abstract
Large Language Models (LLMs) exhibit the ability to perform in-context learning (ICL), where they acquire new tasks directly from examples provided in demonstrations. This process is thought to operate through an implicit task selection mechanism that involves extracting and processing task definitions from these demonstrations. However, critical questions remain: Which is more essential -- task extraction or definition? And how can these capabilities be further improved? To address these questions, we propose \textbf{TEGEE} (Task Definition Guided Expert Ensembling), a method that explicitly extracts task definitions and generates responses based on specific tasks. Our framework employs a dual 3B model approach, with each model assigned a distinct role: one focuses on task definition extraction, while the other handles learning from demonstrations. This modular approach supports the hypothesis that extracting task definitions is more vital than processing the task itself. Empirical evaluations show that TEGEE performs comparably to the larger LLaMA2-13B model. By leveraging a modular design, our approach extends traditional ICL from few-shot to many-shot learning, supporting an unlimited number of demonstrations and enhancing continual learning capabilities.
