What Do Language Models Learn in Context? The Structured Task Hypothesis

Jiaoda Li; Yifan Hou; Mrinmaya Sachan; Ryan Cotterell

What Do Language Models Learn in Context? The Structured Task Hypothesis

Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell

TL;DR

The paper empirically interrogates why large language models learn in context by testing three competing theories: task selection, meta-learning of a learning algorithm, and composing pre-trained tasks. Through RA and PA task constructions on multiple text-classification datasets with the LLaMA2-70B model, the authors find that RA tasks are learnable in context and PA tasks are not, arguing against a pure meta-learning or recognition-based account. They further show partial support for task composition: RA performance correlates with the learnability of primitive g-tasks, and natural-function compositions (e.g., synonyms) can be learned in-context, suggesting novel tasks can be formed by composing learned tasks. Overall, the results indicate that ICL may arise from composing learned primitives rather than instantiating pre-trained learning algorithms, with implications for understanding and harnessing in-context adaptation.

Abstract

Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs' ability to learn in context with a suite of experiments derived from common text classification tasks. We invalidate the first two hypotheses with counterexamples and provide evidence in support of the last hypothesis. Our results suggest an LLM could learn a novel task in context via composing tasks learned during pre-training.

What Do Language Models Learn in Context? The Structured Task Hypothesis

TL;DR

Abstract

Paper Structure (47 sections, 1 theorem, 12 equations, 9 figures, 4 tables)

This paper contains 47 sections, 1 theorem, 12 equations, 9 figures, 4 tables.

Introduction
Preliminaries
Language Models
A Restatement of the Hypotheses
Testing \ref{['hyp:recognize']}
Experimental Setup
Tasks.
Experimental Setup.
Settings.
Implementation Details.
Results
ICL Example Number.
Model Size.
Summary.
Testing \ref{['hyp:learn']}
...and 32 more sections

Key Result

Proposition 1

Task composition is associative, i.e., $({\color{MacroColor} \tau}_1\circ{\color{MacroColor} \tau}_2)\circ{\color{MacroColor} \tau}_3={\color{MacroColor} \tau}_1\circ({\color{MacroColor} \tau}_2\circ{\color{MacroColor} \tau}_3)$.

Figures (9)

Figure 1: The illustration of three hypotheses.
Figure 2: Illustrations of a sentiment classification task, a response-altered (RA) task, and a prompt-altered (PA) task.
Figure 3: Performance of vanilla ICL and ${\color{MacroColor} {\color{MacroColor} \tau}_{\text{RA}}}$-ICL on the $3$ datasets with different demonstration lengths $L$. LLaMA2-70B is used. The LLM is able to learn RA tasks as $L$ grows.
Figure 4: Average performance of vanilla ICl and ${\color{MacroColor} {\color{MacroColor} \tau}_{\text{RA}}}$-ICL across $3$ datasets (CR, SST-2, AG News). Demonstration length $L=32$. LLaMA2-70B yields the best performance but LLaMA2-7B is not far behind.
Figure 5: Performance of various settings across $3$ text classification tasks. LLaMA2-70B is used. ${\color{MacroColor} {\color{MacroColor} \tau}_{\text{PA}}}$-ICL performs worse than ${\color{MacroColor} {\color{MacroColor} \tau}_{\text{PA}}}$-LR and chance.
...and 4 more figures

Theorems & Definitions (2)

Proposition 1
proof

What Do Language Models Learn in Context? The Structured Task Hypothesis

TL;DR

Abstract

What Do Language Models Learn in Context? The Structured Task Hypothesis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)