Table of Contents
Fetching ...

Context Tuning for In-Context Optimization

Jack Lu, Ryan Teehan, Zhenbang Yang, Mengye Ren

TL;DR

Context Tuning proposes a context-centric paradigm for few-shot learning in large language models, introducing CT-Prompt and CT-KV to optimize the in-context demonstration representation instead of updating model weights. CT-KV, which learns per-layer key–value prefixes derived from demonstrations, achieves linear-time training with respect to the number of demonstrations and consistently outperforms traditional prompt-based methods and ICL across NLP-LR, MMLU, BBH, and ARC, while competing with Test-Time Training (TTT). The approach introduces Leave-One-Out Masking and Token Dropout as key design choices, demonstrates robustness to demonstration count and quality, and shows that CT-KV can refine TTT results when used as a post-hoc step. Framed within the In-Context Optimization (ICO) paradigm, the work highlights the practicality and scalability of context-based adaptation as a complement or alternative to weight-based updates in few-shot learning.

Abstract

We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.

Context Tuning for In-Context Optimization

TL;DR

Context Tuning proposes a context-centric paradigm for few-shot learning in large language models, introducing CT-Prompt and CT-KV to optimize the in-context demonstration representation instead of updating model weights. CT-KV, which learns per-layer key–value prefixes derived from demonstrations, achieves linear-time training with respect to the number of demonstrations and consistently outperforms traditional prompt-based methods and ICL across NLP-LR, MMLU, BBH, and ARC, while competing with Test-Time Training (TTT). The approach introduces Leave-One-Out Masking and Token Dropout as key design choices, demonstrates robustness to demonstration count and quality, and shows that CT-KV can refine TTT results when used as a post-hoc step. Framed within the In-Context Optimization (ICO) paradigm, the work highlights the practicality and scalability of context-based adaptation as a complement or alternative to weight-based updates in few-shot learning.

Abstract

We introduce Context Tuning, a simple and effective method to significantly enhance few-shot adaptation of language models (LLMs) without fine-tuning model parameters. While prompt-based adaptation techniques have demonstrated the effectiveness of lightweight adaptation methods for LLMs, they typically initialize a trainable prompt or prefix with irrelevant tokens for the task at hand. In contrast, Context Tuning initializes the trainable prompt or prefix with task-specific demonstration examples, leveraging the model's inherent In-Context Learning (ICL) ability to extract relevant information for improved few-shot learning performance. Extensive evaluations on benchmarks such as CrossFit, UnifiedQA, MMLU, BIG-Bench Hard, and ARC demonstrate that Context Tuning outperforms traditional prompt-based adaptation methods and achieves competitive accuracy to Test-Time Training with significantly higher training efficiency.

Paper Structure

This paper contains 48 sections, 17 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Comparison of training-free, prompt-based adaptation, and In-Context Optimization methods on solving 26 NLP-LR tasks from Table \ref{['tab:main']}. Circles are baselines; stars are our methods; bolded methods attain the best performance-efficiency tradeoff.
  • Figure 2: CT-KV, the variant of Context Tuning that optimizes the key-value prefixes derived from in-context demonstration pairs. CT-KV (left) first initializes a prefix $\{K_i, V_i\}_{i=1}^k$ from demonstration pairs $\{(x_i, y_i)\}_{i=1}^k$, then trains it to solve each pair. To prevent the model from simply retrieving the demonstration pair from the prefix, Leave-One-Out Masking prevents the model from attending to $K_i, V_i$ when solving pair $i$. At generation time (right), the model conditions on all optimized prefixes $\{K_i^*, V_i^*\}_{i=1}^k$ to solve query $x_q$.
  • Figure 3: One test pair from BBH, NLP-LR, and MMLU each, and 3 demonstration pairs followed by a test pair from ARC. BBH contains instructions that we prepend to model inputs. NLP-LR and MMLU contain multiple-choice options for the model to select. To avoid clutter, we show demonstration pairs from BBH, NLP-LR, and MMLU in Appendix \ref{['appendix:qualitative']}.
  • Figure 4: Performance of ICL, Prefix Tuning, and CT-KV under variations of (a) numbers of demonstration pairs and (b) probabilities of corrupted demonstration labels on NLP-LR and MMLU.
  • Figure 5: Left is an ARC task that CT-KV successfully solves, but ICL does not. Conversely, the task on the right is solved by ICL but not by CT-KV.
  • ...and 5 more figures