Table of Contents
Fetching ...

A Survey to Recent Progress Towards Understanding In-Context Learning

Haitao Mao, Guangliang Liu, Yao Ma, Rongrong Wang, Kristen Johnson, Jiliang Tang

TL;DR

This paper tackles understanding In-Context Learning (ICL) in large language models by introducing a data generation perspective that frames ICL through data-generation functions. It distinguishes two core abilities: skill recognition (selecting a pretraining data-generation function) and skill learning (acquiring a new function in-context), and develops corresponding analysis frameworks based on Bayesian inference and the function learning paradigm. The authors synthesize theoretical and empirical results showing that ICL can act as a Bayesian-optimal selector within a constrained function class and can implement gradient-descent-like updates or closed-form solutions in larger models. They also outline future directions, including extending these frameworks to more realistic data-generation scenarios, exploring emergent skill composition, and guiding safer, more robust use of ICL in real-world applications.

Abstract

In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn from a few examples provided in the prompt, enabling downstream generalization without the requirement for gradient updates. Despite encouragingly empirical success, the underlying mechanism of ICL remains unclear. Existing research remains ambiguous with various viewpoints, utilizing intuition-driven and ad-hoc technical solutions to interpret ICL. In this paper, we leverage a data generation perspective to reinterpret recent efforts from a systematic angle, demonstrating the potential broader usage of these popular technical solutions. For a conceptual definition, we rigorously adopt the terms of skill recognition and skill learning. Skill recognition selects one learned data generation function previously seen during pre-training while skill learning can learn new data generation functions from in-context data. Furthermore, we provide insights into the strengths and weaknesses of both abilities, emphasizing their commonalities through the perspective of data generation. This analysis suggests potential directions for future research.

A Survey to Recent Progress Towards Understanding In-Context Learning

TL;DR

This paper tackles understanding In-Context Learning (ICL) in large language models by introducing a data generation perspective that frames ICL through data-generation functions. It distinguishes two core abilities: skill recognition (selecting a pretraining data-generation function) and skill learning (acquiring a new function in-context), and develops corresponding analysis frameworks based on Bayesian inference and the function learning paradigm. The authors synthesize theoretical and empirical results showing that ICL can act as a Bayesian-optimal selector within a constrained function class and can implement gradient-descent-like updates or closed-form solutions in larger models. They also outline future directions, including extending these frameworks to more realistic data-generation scenarios, exploring emergent skill composition, and guiding safer, more robust use of ICL in real-world applications.

Abstract

In-Context Learning (ICL) empowers Large Language Models (LLMs) with the ability to learn from a few examples provided in the prompt, enabling downstream generalization without the requirement for gradient updates. Despite encouragingly empirical success, the underlying mechanism of ICL remains unclear. Existing research remains ambiguous with various viewpoints, utilizing intuition-driven and ad-hoc technical solutions to interpret ICL. In this paper, we leverage a data generation perspective to reinterpret recent efforts from a systematic angle, demonstrating the potential broader usage of these popular technical solutions. For a conceptual definition, we rigorously adopt the terms of skill recognition and skill learning. Skill recognition selects one learned data generation function previously seen during pre-training while skill learning can learn new data generation functions from in-context data. Furthermore, we provide insights into the strengths and weaknesses of both abilities, emphasizing their commonalities through the perspective of data generation. This analysis suggests potential directions for future research.
Paper Structure (35 sections, 3 equations, 1 figure, 1 table)

This paper contains 35 sections, 3 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Illustration of ICL for Sentiment Analysis. The upper instances (with background color gray) are the labeled in-context demonstrations, while the last line is the query for which LLMs infer the sentiment label.