Table of Contents
Fetching ...

In-Context Learning with Hypothesis-Class Guidance

Ziqian Lin, Shubham Kumar Bharti, Kangwook Lee

TL;DR

This work tackles how to enhance in-context learning by injecting explicit task instructions in the form of a hypothesis-class prefix. By encoding a finite hypothesis class $\mathcal{H}$ and concatenating it with in-context $(x,y)$ demonstrations, a Transformer learns to both predict labels and identify the underlying hypothesis, generalizing to unseen hypotheses and even unseen hypothesis classes. Key findings show that (i) Transformers can acquire ICL-HCG capabilities and generalize across ID and OOD settings, (ii) explicit hypothesis-class instruction yields markedly higher accuracy than instruction-agnostic ICL, and (iii) pretraining hypothesis diversity and model architecture modulate OOD generalization and length generalization. The results highlight the practical potential of instruction-guided ICL and provide a controlled framework for studying ICL mechanisms, generalization, and the effects of data structure and prompts on LLM behavior.

Abstract

Recent research has investigated the underlying mechanisms of in-context learning (ICL) both theoretically and empirically, often using data generated from simple function classes. However, the existing work often focuses on the sequence consisting solely of labeled examples, while in practice, labeled examples are typically accompanied by an instruction, providing some side information about the task. In this work, we propose ICL with hypothesis-class guidance (ICL-HCG), a novel synthetic data model for ICL where the input context consists of the literal description of a (finite) hypothesis class H and $(x,y)$ pairs from a hypothesis chosen from H. Under our framework ICL-HCG, we conduct extensive experiments to explore: (i) a variety of generalization abilities to new hypothesis classes; (ii) different model architectures; (iii) sample complexity; (iv) in-context data imbalance; (v) the role of instruction; and (vi) the effect of pretraining hypothesis diversity. As a result, we show that (a) Transformers can successfully learn ICL-HCG and generalize to unseen hypotheses and unseen hypothesis classes, and (b) compared with ICL without instruction, ICL-HCG achieves significantly higher accuracy, demonstrating the role of instructions.

In-Context Learning with Hypothesis-Class Guidance

TL;DR

This work tackles how to enhance in-context learning by injecting explicit task instructions in the form of a hypothesis-class prefix. By encoding a finite hypothesis class and concatenating it with in-context demonstrations, a Transformer learns to both predict labels and identify the underlying hypothesis, generalizing to unseen hypotheses and even unseen hypothesis classes. Key findings show that (i) Transformers can acquire ICL-HCG capabilities and generalize across ID and OOD settings, (ii) explicit hypothesis-class instruction yields markedly higher accuracy than instruction-agnostic ICL, and (iii) pretraining hypothesis diversity and model architecture modulate OOD generalization and length generalization. The results highlight the practical potential of instruction-guided ICL and provide a controlled framework for studying ICL mechanisms, generalization, and the effects of data structure and prompts on LLM behavior.

Abstract

Recent research has investigated the underlying mechanisms of in-context learning (ICL) both theoretically and empirically, often using data generated from simple function classes. However, the existing work often focuses on the sequence consisting solely of labeled examples, while in practice, labeled examples are typically accompanied by an instruction, providing some side information about the task. In this work, we propose ICL with hypothesis-class guidance (ICL-HCG), a novel synthetic data model for ICL where the input context consists of the literal description of a (finite) hypothesis class H and pairs from a hypothesis chosen from H. Under our framework ICL-HCG, we conduct extensive experiments to explore: (i) a variety of generalization abilities to new hypothesis classes; (ii) different model architectures; (iii) sample complexity; (iv) in-context data imbalance; (v) the role of instruction; and (vi) the effect of pretraining hypothesis diversity. As a result, we show that (a) Transformers can successfully learn ICL-HCG and generalize to unseen hypotheses and unseen hypothesis classes, and (b) compared with ICL without instruction, ICL-HCG achieves significantly higher accuracy, demonstrating the role of instructions.

Paper Structure

This paper contains 54 sections, 10 equations, 18 figures, 3 tables, 1 algorithm.

Figures (18)

  • Figure 1: Common ICL framework vs. ours. Conventional frameworks with synthetic datasets often construct sequences by concatenating multiple $({\bm{x}}, {\bm{y}})$ pairs, overlooking the importance of instructions. In contrast, our approach explicitly incorporates instructions through a hypothesis prefix. Specifically, we transform the hypothesis class $\mathcal{H}$ into a sequence that is prepended to the sequence of $({\bm{x}}, {\bm{y}})$ pairs and then fed into a Transformer. We refer to this method as in-context learning with hypothesis-class guidance (ICL-HCG). (Real-world examples are demonstrated using the GPT-4 Legacy model.)
  • Figure 2: Four types of generalization. An illustration of the four types of generalization.
  • Figure 3: Learning ICL-HCG via Transformer. We begin by sampling a subset from the hypothesis universe as the hypothesis class $\mathcal{H}$. Next, we encode the hypothesis class $\mathcal{H}$ and concatenate it with context query into a unified sequences of token. This sequences is fed into a Transformer model for training with next-token prediction, and testing for evaluating the accuracy on $y$ and hypothesis identification. (This figure is an simplified illustration. Please refer to Appendix \ref{['app:prefix']} and Fig. \ref{['fig:frameworkfull']} for the full details.)
  • Figure 4: The generation of training and testing hypothesis classes. The hypothesis universe is partitioned into two pools: one for generating training and ID testing hypothesis classes, and another for generating OOD testing hypothesis classes.
  • Figure 5: Multiple runs on ID and OOD hypothesis class generalizations. (Different runs imply training and testing with different random seeds.) Transformer successfully learns ICL-HCG, and generalizes to new hypothesis classes and hypotheses. Generalization on ID hypotheses is easier than on OOD hypotheses. Refer to Appendix \ref{['subapp:4generalization']}, Fig. \ref{['fig:multiple_curves_IO_2x3']} for more curves of loss, training and testing accuracy.
  • ...and 13 more figures

Theorems & Definitions (4)

  • Definition 3.1: ID Hypothesis Class Generalization
  • Definition 3.2: OOD Hypothesis Class Generalization
  • Definition 3.3: ID Hypothesis Class Size Generalization
  • Definition 3.4: OOD Hypothesis Class Size Generalization