Table of Contents
Fetching ...

Implicit In-context Learning

Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas

TL;DR

The paper tackles the high cost and sensitivity of In-context Learning by introducing Implicit In-context Learning (I2CL), which compresses demonstrations into a context vector derived from activation-space activations and injects a linear fusion of this vector with query activations into residual streams. This design eliminates the need for heavy attention-based fusion and token-level prompting, reducing inference cost to zero-shot levels while preserving few-shot performance. Key contributions include a context-vectorization pipeline, a lightweight context-injection mechanism with layer-wise scalars learned via Noisy Self-calibration, and a demonstration-agnostic, task-id–like representation that enables transfer. Empirical results across nine tasks and three model families show robust performance, strong resistance to demonstration variations, and effective transfer opportunities, supported by extensive ablations and analyses.

Abstract

In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples. Furthermore, I2CL facilitates a novel representation of task-ids, enhancing task similarity detection and fostering effective transfer learning. We also perform a comprehensive analysis and ablation study on I2CL, offering deeper insights into its internal mechanisms. Code is available at https://github.com/LzVv123456/I2CL.

Implicit In-context Learning

TL;DR

The paper tackles the high cost and sensitivity of In-context Learning by introducing Implicit In-context Learning (I2CL), which compresses demonstrations into a context vector derived from activation-space activations and injects a linear fusion of this vector with query activations into residual streams. This design eliminates the need for heavy attention-based fusion and token-level prompting, reducing inference cost to zero-shot levels while preserving few-shot performance. Key contributions include a context-vectorization pipeline, a lightweight context-injection mechanism with layer-wise scalars learned via Noisy Self-calibration, and a demonstration-agnostic, task-id–like representation that enables transfer. Empirical results across nine tasks and three model families show robust performance, strong resistance to demonstration variations, and effective transfer opportunities, supported by extensive ablations and analyses.

Abstract

In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples. Furthermore, I2CL facilitates a novel representation of task-ids, enhancing task similarity detection and fostering effective transfer learning. We also perform a comprehensive analysis and ablation study on I2CL, offering deeper insights into its internal mechanisms. Code is available at https://github.com/LzVv123456/I2CL.
Paper Structure (27 sections, 7 equations, 7 figures, 15 tables, 1 algorithm)

This paper contains 27 sections, 7 equations, 7 figures, 15 tables, 1 algorithm.

Figures (7)

  • Figure 1: Upper-left is better. Comparison of accuracy, inference speed and cached memory of different methods on Llama2-7b.
  • Figure 2: A schematic overview of I2CL, including a single layer for illustrative purpose.
  • Figure 3: Scaling trend of I2CL.
  • Figure 4: Left: Evaluation of I2CL and few-shot learning under deficient demonstrations. The symbol $*$ denotes the results under deficient demonstration examples. "Unseen demo" refers to the evaluation of calibrated coefficients on unseen demonstrations. Middle: Analysis of the influencing factors of context vectors. "Random-label" indicates random input-label mappings. "Random-order" refers to the random permutation of words. "W/o format" signifies excluding the template tokens during the creation of context vectors. Right: t-SNE plot of context vectors. Each circle denotes a context vector generated using a group of randomly sampled demonstration examples.
  • Figure 5: Left: t-SNE visualization of calibrated linear coefficients. Each circle denotes a runtime with a random seed. Middle: This image displays the transfer results among various tasks. Each row represents a source task and each column denotes a target task. Red and blue colors signify positive and negative transfer outcomes, respectively. Right: This plot shows the calibrated linear coefficients for SST-2. $\lambda^a, \beta^a, \lambda^m, \beta^m$ are the layer-wise coefficients described in Equation \ref{['eq:inject']}.
  • ...and 2 more figures