Table of Contents
Fetching ...

Meta-in-context learning in large language models

Julian Coda-Forno, Marcel Binz, Zeynep Akata, Matthew Botvinick, Jane X. Wang, Eric Schulz

TL;DR

The paper investigates whether large language models can further enhance their own in-context learning through meta-in-context learning, enabling adaptation to new environments without finetuning. It tests GPT-3 on two artificial domains (one-dimensional regression and two-armed bandits) and a real-world regression benchmark, showing that sequential exposure to related tasks reshapes priors and learning strategies toward environmental statistics. The results demonstrate that meta-in-context learning achieves improvements within and across tasks, reduces extreme predictions, and can reach performance competitive with traditional algorithms. These findings open a path toward environment-aware adaptation of LLMs via context-driven meta-learning, with supplementary insights from GPT-4.

Abstract

Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.

Meta-in-context learning in large language models

TL;DR

The paper investigates whether large language models can further enhance their own in-context learning through meta-in-context learning, enabling adaptation to new environments without finetuning. It tests GPT-3 on two artificial domains (one-dimensional regression and two-armed bandits) and a real-world regression benchmark, showing that sequential exposure to related tasks reshapes priors and learning strategies toward environmental statistics. The results demonstrate that meta-in-context learning achieves improvements within and across tasks, reduces extreme predictions, and can reach performance competitive with traditional algorithms. These findings open a path toward environment-aware adaptation of LLMs via context-driven meta-learning, with supplementary insights from GPT-4.

Abstract

Large language models have shown tremendous performance in a variety of tasks. In-context learning -- the ability to improve at a task after being provided with a number of demonstrations -- is seen as one of the main contributors to their success. In the present paper, we demonstrate that the in-context learning abilities of large language models can be recursively improved via in-context learning itself. We coin this phenomenon meta-in-context learning. Looking at two idealized domains, a one-dimensional regression task and a two-armed bandit task, we show that meta-in-context learning adaptively reshapes a large language model's priors over expected tasks. Furthermore, we find that meta-in-context learning modifies the in-context learning strategies of such models. Finally, we extend our approach to a benchmark of real-world regression problems where we observe competitive performance to traditional learning algorithms. Taken together, our work improves our understanding of in-context learning and paves the way toward adapting large language models to the environment they are applied purely through meta-in-context learning rather than traditional finetuning.
Paper Structure (12 sections, 1 equation, 5 figures)

This paper contains 12 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: High-level overview of our approach on an example of multiple three-shot regression tasks. We present an LLM with $N$ learning tasks in a row. Improvement within a task indicates that the model is capable of in-context learning. If in-context learning improves across multiple learning tasks, the model is also capable of meta-in-context learning.
  • Figure 2: Meta-in-context learning on the one-dimensional regression task (100 simulations). Errors bars represent $95\%$ confidence intervals. A: MSE across trials for different models. B: GPT-3's MSE averaged over trials for each task. C: Effects of trial and task for estimating the MSE. D: GPT-3's prior expectations across tasks (blue) compared to the true task distribution (orange).
  • Figure 3: Meta-in-context learning on the two-armed bandit experiment (500 simulations). Errors bars represent $95\%$ confidence intervals. A: Regrets across trials for different models. B: GPT-3's regrets averaged over trials for each task. C: Effects of trial and task for estimating the regret. D: GPT-3's prior expectation of rewards across games. E: Probit regression coefficients for different strategies and their interaction with task number.
  • Figure 4: Meta-in-context learning on the real-world regression benchmark (42 $\cdot$ 50 simulations). Errors bars represent $95\%$ confidence intervals. A: RMSE across trials for different models. B: Percentage of predictions outside or equal to the extremes of the squashed target range. C: Effects of trial and task similarities for estimating the RMSE.
  • Figure 5: Meta-in-context learning on the regression on real-world data experiment (42 $\cdot$ 30 simulations). Errors bars represent $95\%$ confidence intervals. A: RMSE across trials for different models. B: Percentage of predictions outside or equal to the extremes of the squashed target range. C: Effects of trial and task similarities for estimating the RMSE.