Implicit In-context Learning
Zhuowei Li, Zihao Xu, Ligong Han, Yunhe Gao, Song Wen, Di Liu, Hao Wang, Dimitris N. Metaxas
TL;DR
The paper tackles the high cost and sensitivity of In-context Learning by introducing Implicit In-context Learning (I2CL), which compresses demonstrations into a context vector derived from activation-space activations and injects a linear fusion of this vector with query activations into residual streams. This design eliminates the need for heavy attention-based fusion and token-level prompting, reducing inference cost to zero-shot levels while preserving few-shot performance. Key contributions include a context-vectorization pipeline, a lightweight context-injection mechanism with layer-wise scalars learned via Noisy Self-calibration, and a demonstration-agnostic, task-id–like representation that enables transfer. Empirical results across nine tasks and three model families show robust performance, strong resistance to demonstration variations, and effective transfer opportunities, supported by extensive ablations and analyses.
Abstract
In-context Learning (ICL) empowers large language models (LLMs) to swiftly adapt to unseen tasks at inference-time by prefixing a few demonstration examples before queries. Despite its versatility, ICL incurs substantial computational and memory overheads compared to zero-shot learning and is sensitive to the selection and order of demonstration examples. In this work, we introduce Implicit In-context Learning (I2CL), an innovative paradigm that reduces the inference cost of ICL to that of zero-shot learning with minimal information loss. I2CL operates by first generating a condensed vector representation, namely a context vector, extracted from the demonstration examples. It then conducts an inference-time intervention through injecting a linear combination of the context vector and query activations back into the model's residual streams. Empirical evaluation on nine real-world tasks across three model architectures demonstrates that I2CL achieves few-shot level performance at zero-shot inference cost, and it exhibits robustness against variations in demonstration examples. Furthermore, I2CL facilitates a novel representation of task-ids, enhancing task similarity detection and fostering effective transfer learning. We also perform a comprehensive analysis and ablation study on I2CL, offering deeper insights into its internal mechanisms. Code is available at https://github.com/LzVv123456/I2CL.
