Implicit In-context Learning

Zhuowei Li; Zihao Xu; Ligong Han; Yunhe Gao; Song Wen; Di Liu; Hao Wang; Dimitris N. Metaxas

Implicit In-context Learning

Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas

TL;DR

The paper tackles the high cost and sensitivity of In-context Learning by introducing Implicit In-context Learning (I2CL), which compresses demonstrations into a context vector derived from activation-space activations and injects a linear fusion of this vector with query activations into residual streams. This design eliminates the need for heavy attention-based fusion and token-level prompting, reducing inference cost to zero-shot levels while preserving few-shot performance. Key contributions include a context-vectorization pipeline, a lightweight context-injection mechanism with layer-wise scalars learned via Noisy Self-calibration, and a demonstration-agnostic, task-id–like representation that enables transfer. Empirical results across nine tasks and three model families show robust performance, strong resistance to demonstration variations, and effective transfer opportunities, supported by extensive ablations and analyses.

Abstract

In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples. Furthermore, I2CL facilitates a novel representation of task-ids, enhancing task similarity detection and fostering effective transfer learning. We also perform a comprehensive analysis and ablation study on I2CL, offering deeper insights into its internal mechanisms. Code is available at https://github.com/LzVv123456/I2CL.

Implicit In-context Learning

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 7 figures, 15 tables, 1 algorithm)

This paper contains 27 sections, 7 equations, 7 figures, 15 tables, 1 algorithm.

Introduction
Methodology
Preliminaries
Context Vectorization
Context Injection
Noisy Self-calibration
Experiments
Benchmarking I2CL
On the Formation of the Context Vector
Analysis of Calibrated Linear Coefficients
Ablation Study
Related Work
Limitations
Conclusion
Prompting Templates
...and 12 more sections

Figures (7)

Figure 1: Upper-left is better. Comparison of accuracy, inference speed and cached memory of different methods on Llama2-7b.
Figure 2: A schematic overview of I2CL, including a single layer for illustrative purpose.
Figure 3: Scaling trend of I2CL.
Figure 4: Left: Evaluation of I2CL and few-shot learning under deficient demonstrations. The symbol $*$ denotes the results under deficient demonstration examples. "Unseen demo" refers to the evaluation of calibrated coefficients on unseen demonstrations. Middle: Analysis of the influencing factors of context vectors. "Random-label" indicates random input-label mappings. "Random-order" refers to the random permutation of words. "W/o format" signifies excluding the template tokens during the creation of context vectors. Right: t-SNE plot of context vectors. Each circle denotes a context vector generated using a group of randomly sampled demonstration examples.
Figure 5: Left: t-SNE visualization of calibrated linear coefficients. Each circle denotes a runtime with a random seed. Middle: This image displays the transfer results among various tasks. Each row represents a source task and each column denotes a target task. Red and blue colors signify positive and negative transfer outcomes, respectively. Right: This plot shows the calibrated linear coefficients for SST-2. $\lambda^a, \beta^a, \lambda^m, \beta^m$ are the layer-wise coefficients described in Equation \ref{['eq:inject']}.
...and 2 more figures

Implicit In-context Learning

TL;DR

Abstract

Implicit In-context Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)