Table of Contents
Fetching ...

ELICIT: LLM Augmentation via External In-Context Capability

Futing Wang, Jianhao Yan, Yue Zhang, Tao Lin

TL;DR

ELICIT addresses the need for adaptive LLM augmentation without retraining or token-heavy prompting by externalizing in-context learned capabilities as task vectors stored in a capability library. The framework combines a capability library with a dynamic retrieval module to selectively intervene in hidden states at a learned layer, using additive or replacement strategies to elicit capabilities with minimal overhead. Empirical results show consistent improvements across diverse models and tasks, including generalization to unseen tasks and complementary gains when paired with BM25 for smaller models, while revealing scale-dependent effects for larger models. These findings highlight a scalable, plug-and-play approach to expanding LLM versatility and efficiency, with practical implications for on-demand capability elicitation in real-world deployments.

Abstract

Enhancing the adaptive capabilities of large language models is a critical pursuit in both research and application. Traditional fine-tuning methods require substantial data and computational resources, especially for enhancing specific capabilities, while in-context learning is limited by the need for appropriate demonstrations and efficient token usage. Inspired by the expression of in-context learned capabilities through task vectors and the concept of modularization, we propose \alg, a framework consisting of two modules designed to effectively store and reuse task vectors to elicit the diverse capabilities of models without additional training or inference tokens. Our comprehensive experiments and analysis demonstrate that our pipeline is highly transferable across different input formats, tasks, and model architectures. ELICIT serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities. By externally storing and reusing vectors that represent in-context learned capabilities, \alg not only demonstrates the potential to operate modular capabilities but also significantly enhances the performance, versatility, adaptability, and scalability of large language models. Our code will be publicly available at https://github.com/LINs-lab/ELICIT.

ELICIT: LLM Augmentation via External In-Context Capability

TL;DR

ELICIT addresses the need for adaptive LLM augmentation without retraining or token-heavy prompting by externalizing in-context learned capabilities as task vectors stored in a capability library. The framework combines a capability library with a dynamic retrieval module to selectively intervene in hidden states at a learned layer, using additive or replacement strategies to elicit capabilities with minimal overhead. Empirical results show consistent improvements across diverse models and tasks, including generalization to unseen tasks and complementary gains when paired with BM25 for smaller models, while revealing scale-dependent effects for larger models. These findings highlight a scalable, plug-and-play approach to expanding LLM versatility and efficiency, with practical implications for on-demand capability elicitation in real-world deployments.

Abstract

Enhancing the adaptive capabilities of large language models is a critical pursuit in both research and application. Traditional fine-tuning methods require substantial data and computational resources, especially for enhancing specific capabilities, while in-context learning is limited by the need for appropriate demonstrations and efficient token usage. Inspired by the expression of in-context learned capabilities through task vectors and the concept of modularization, we propose \alg, a framework consisting of two modules designed to effectively store and reuse task vectors to elicit the diverse capabilities of models without additional training or inference tokens. Our comprehensive experiments and analysis demonstrate that our pipeline is highly transferable across different input formats, tasks, and model architectures. ELICIT serves as a plug-and-play performance booster to enable adaptive elicitation of model capabilities. By externally storing and reusing vectors that represent in-context learned capabilities, \alg not only demonstrates the potential to operate modular capabilities but also significantly enhances the performance, versatility, adaptability, and scalability of large language models. Our code will be publicly available at https://github.com/LINs-lab/ELICIT.

Paper Structure

This paper contains 37 sections, 1 theorem, 2 equations, 16 figures, 17 tables.

Key Result

Lemma 3.3

Given a task vector $\boldsymbol{\theta}$ that effectively captures the information from demonstrations in an ICL setting, we can simulate the behavior of regular ICL with only query as follows: where:

Figures (16)

  • Figure 1: Illustration of ELICIT, which dynamically retrieves and integrates task vectors from a capability library to augment a language model's performance on arbitrary queries, without increasing token usage during inference.
  • Figure 2: Overview of the proposed ELICIT framework for Large Language Model Augmentation.ELICIT consists of two modular components: (1) Build Capability Library - constructing a library of task-specific task vectors by learning from diverse task; (2) Dynamic Capability Elicitation - dynamically retrieving and integrating relevant task vectors from the library to augment the model's capability for an arbitrary input query.
  • Figure 3: Varying intervention strengths affect accuracy and cross-entropy loss in Llama3-8B on valid set of $20$ tasks across different layer. Higher in intervention strengths improve average task performance across layers but negatively impact language modeling capabilities. This reveals a trade-off between task-specific enhancement and general language modeling proficiency using task vectors.
  • Figure 4: Precision-Recall Curves and recall sweeping on Llama3-8B in valid set across 20 tasks. (a) Precision-Recall curves for the retriever across 20 tasks (AUC=0.96), guiding threshold selection for high recall and precision. (b) Validation set accuracy after intervention using different recall thresholds.
  • Figure 5: Performance on ELICIT across different domains when the library only contains math-related task vectors on Mistral.
  • ...and 11 more figures

Theorems & Definitions (4)

  • Definition 3.1: Hidden State Representation in Transformers
  • Definition 3.2: Task Vector $\boldsymbol{\theta}$
  • Lemma 3.3: Task Vector for ICL Simulation
  • Remark 3.4: Intervention of Task Vector $\boldsymbol{\theta}$