Table of Contents
Fetching ...

Beyond Demonstrations: Dynamic Vector Construction from Latent Representations

Wang Cai, Hsiu-Yuan Huang, Zhixiang Wang, Yunfang Wu

TL;DR

DyVec tackles the inefficiency and fragility of few-shot learning via In-Context Vectors by introducing a three-pronged approach: Exhaustive Query Rotation to derive robust latent representations, Dynamic Latent Segmentation to tailor vector granularity, and a REINFORCE-based strategy to learn optimal injection positions for inference-time intervention. The method leverages semantically aggregated latent representations from Multi-Head Attention and injects them into frozen LLMs to emulate or exceed few-shot ICL performance with zero training. Empirical results across six tasks and three models demonstrate that DyVec consistently outperforms few-shot ICL, LoRA, and prior ICV baselines, while maintaining high inference efficiency. This work advances practical, data-efficient, and robust task adaptation for large language models by revealing the value of structured latent signals and learned injection strategies.

Abstract

In-Context derived Vector (ICV) methods extract task-relevant representations from large language models (LLMs) and reinject them during inference, achieving comparable performance to few-shot In-Context Learning (ICL) without repeated demonstration processing. However, existing ICV methods remain sensitive to ICL-specific factors, often use coarse or semantically fragmented representations as the source of the vector, and rely on heuristic-based injection positions, limiting their applicability. To address these issues, we propose Dynamic Vector (DyVec), which incorporates an Exhaustive Query Rotation (EQR) strategy to extract robust semantically aggregated latent representations by mitigating variance introduced by ICL. It then applies Dynamic Latent Segmentation and Injection to adaptively partition representations based on task complexity and leverages REINFORCE-based optimization to learn optimal injection positions for each segment. Experiments results show that DyVec outperforms few-shot ICL, LoRA, and prior ICV baselines. Further analysis highlights the effectiveness of dynamically segmenting and injecting semantically aggregated latent representations. DyVec provides a lightweight and data-efficient solution for inference-time task adaptation.

Beyond Demonstrations: Dynamic Vector Construction from Latent Representations

TL;DR

DyVec tackles the inefficiency and fragility of few-shot learning via In-Context Vectors by introducing a three-pronged approach: Exhaustive Query Rotation to derive robust latent representations, Dynamic Latent Segmentation to tailor vector granularity, and a REINFORCE-based strategy to learn optimal injection positions for inference-time intervention. The method leverages semantically aggregated latent representations from Multi-Head Attention and injects them into frozen LLMs to emulate or exceed few-shot ICL performance with zero training. Empirical results across six tasks and three models demonstrate that DyVec consistently outperforms few-shot ICL, LoRA, and prior ICV baselines, while maintaining high inference efficiency. This work advances practical, data-efficient, and robust task adaptation for large language models by revealing the value of structured latent signals and learned injection strategies.

Abstract

In-Context derived Vector (ICV) methods extract task-relevant representations from large language models (LLMs) and reinject them during inference, achieving comparable performance to few-shot In-Context Learning (ICL) without repeated demonstration processing. However, existing ICV methods remain sensitive to ICL-specific factors, often use coarse or semantically fragmented representations as the source of the vector, and rely on heuristic-based injection positions, limiting their applicability. To address these issues, we propose Dynamic Vector (DyVec), which incorporates an Exhaustive Query Rotation (EQR) strategy to extract robust semantically aggregated latent representations by mitigating variance introduced by ICL. It then applies Dynamic Latent Segmentation and Injection to adaptively partition representations based on task complexity and leverages REINFORCE-based optimization to learn optimal injection positions for each segment. Experiments results show that DyVec outperforms few-shot ICL, LoRA, and prior ICV baselines. Further analysis highlights the effectiveness of dynamically segmenting and injecting semantically aggregated latent representations. DyVec provides a lightweight and data-efficient solution for inference-time task adaptation.

Paper Structure

This paper contains 37 sections, 10 equations, 4 figures, 14 tables, 1 algorithm.

Figures (4)

  • Figure 1: General pipeline of In-Context derived Vector (ICV) methods, illustrating how task-specific representations are extracted from LLMs during few-shot ICL to construct vectors, which are then injected back into frozen LLMs for inference-time intervention and task adaptation. These representations can be either raw activations (e.g., attention heads) or more abstract latent states (e.g., transformer layer outputs).
  • Figure 2: The overview of our proposed model.
  • Figure 3: Relative inference time across different models and methods.
  • Figure 4: Effectiveness of the EQR strategy in DyVec across different models. Results are averaged over three tasks with 8-shot data. Solid lines represent performance using different numbers of randomly constructed prompts ($N = 1, 50, 100$), while dashed lines indicate performance using EQR strategy.