In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin, Shubham Kumar Bharti, Kangwook Lee
TL;DR
This work tackles how to enhance in-context learning by injecting explicit task instructions in the form of a hypothesis-class prefix. By encoding a finite hypothesis class $\mathcal{H}$ and concatenating it with in-context $(x,y)$ demonstrations, a Transformer learns to both predict labels and identify the underlying hypothesis, generalizing to unseen hypotheses and even unseen hypothesis classes. Key findings show that (i) Transformers can acquire ICL-HCG capabilities and generalize across ID and OOD settings, (ii) explicit hypothesis-class instruction yields markedly higher accuracy than instruction-agnostic ICL, and (iii) pretraining hypothesis diversity and model architecture modulate OOD generalization and length generalization. The results highlight the practical potential of instruction-guided ICL and provide a controlled framework for studying ICL mechanisms, generalization, and the effects of data structure and prompts on LLM behavior.
Abstract
Recent research has investigated the underlying mechanisms of in-context learning (ICL) both theoretically and empirically, often using data generated from simple function classes. However, the existing work often focuses on the sequence consisting solely of labeled examples, while in practice, labeled examples are typically accompanied by an instruction, providing some side information about the task. In this work, we propose ICL with hypothesis-class guidance (ICL-HCG), a novel synthetic data model for ICL where the input context consists of the literal description of a (finite) hypothesis class H and $(x,y)$ pairs from a hypothesis chosen from H. Under our framework ICL-HCG, we conduct extensive experiments to explore: (i) a variety of generalization abilities to new hypothesis classes; (ii) different model architectures; (iii) sample complexity; (iv) in-context data imbalance; (v) the role of instruction; and (vi) the effect of pretraining hypothesis diversity. As a result, we show that (a) Transformers can successfully learn ICL-HCG and generalize to unseen hypotheses and unseen hypothesis classes, and (b) compared with ICL without instruction, ICL-HCG achieves significantly higher accuracy, demonstrating the role of instructions.
