Context Tuning for In-Context Optimization
Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren
TL;DR
Context Tuning proposes a context-centric paradigm for few-shot learning in large language models, introducing CT-Prompt and CT-KV to optimize the in-context demonstration representation instead of updating model weights. CT-KV, which learns per-layer key–value prefixes derived from demonstrations, achieves linear-time training with respect to the number of demonstrations and consistently outperforms traditional prompt-based methods and ICL across NLP-LR, MMLU, BBH, and ARC, while competing with Test-Time Training (TTT). The approach introduces Leave-One-Out Masking and Token Dropout as key design choices, demonstrates robustness to demonstration count and quality, and shows that CT-KV can refine TTT results when used as a post-hoc step. Framed within the In-Context Optimization (ICO) paradigm, the work highlights the practicality and scalability of context-based adaptation as a complement or alternative to weight-based updates in few-shot learning.
Abstract
We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.
