Table of Contents
Fetching ...

Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs

Pengrui Han, Xueqiang Xu, Keyang Xuan, Peiyang Song, Siru Ouyang, Runchu Tian, Yuqing Jiang, Cheng Qian, Pengcheng Jiang, Jiashuo Sun, Junxia Cui, Ming Zhong, Ge Liu, Jiawei Han, Jiaxuan You

TL;DR

Steer2Adapt reframes LLM adaptation from learning a single task-specific steering vector to composing a linear combination of reusable semantic concept vectors within a domain-specific subspace. By constraining adaptation to a low-dimensional subspace and optimizing coefficients with a stability-aware Bayesian objective, it achieves data-efficient, transparent, inference-time adaptation across reasoning and safety tasks. Empirical results show consistent gains across three backbone models with strong generalization and favorable efficiency, while analyses reveal sensitivity to subspace relevance and entanglement among basis directions. The approach offers a scalable path for robust, task-aware behavior modulation without parameter updates, with practical implications for rapid deployment in dynamic environments.

Abstract

Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a single static direction per task or concept, making them inflexible under task variation and inadequate for complex tasks that require multiple coordinated capabilities. To address this limitation, we propose STEER2ADAPT, a lightweight framework that adapts LLMs by composing steering vectors rather than learning new ones from scratch. In many domains (e.g., reasoning or safety), tasks share a small set of underlying concept dimensions. STEER2ADAPT captures these dimensions as a reusable, low-dimensional semantic prior subspace, and adapts to new tasks by dynamically discovering a linear combination of basis vectors from only a handful of examples. Experiments across 9 tasks and 3 models in both reasoning and safety domains demonstrate the effectiveness of STEER2ADAPT, achieving an average improvement of 8.2%. Extensive analyses further show that STEER2ADAPT is a data-efficient, stable, and transparent inference-time adaptation method for LLMs.

Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs

TL;DR

Steer2Adapt reframes LLM adaptation from learning a single task-specific steering vector to composing a linear combination of reusable semantic concept vectors within a domain-specific subspace. By constraining adaptation to a low-dimensional subspace and optimizing coefficients with a stability-aware Bayesian objective, it achieves data-efficient, transparent, inference-time adaptation across reasoning and safety tasks. Empirical results show consistent gains across three backbone models with strong generalization and favorable efficiency, while analyses reveal sensitivity to subspace relevance and entanglement among basis directions. The approach offers a scalable path for robust, task-aware behavior modulation without parameter updates, with practical implications for rapid deployment in dynamic environments.

Abstract

Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a single static direction per task or concept, making them inflexible under task variation and inadequate for complex tasks that require multiple coordinated capabilities. To address this limitation, we propose STEER2ADAPT, a lightweight framework that adapts LLMs by composing steering vectors rather than learning new ones from scratch. In many domains (e.g., reasoning or safety), tasks share a small set of underlying concept dimensions. STEER2ADAPT captures these dimensions as a reusable, low-dimensional semantic prior subspace, and adapts to new tasks by dynamically discovering a linear combination of basis vectors from only a handful of examples. Experiments across 9 tasks and 3 models in both reasoning and safety domains demonstrate the effectiveness of STEER2ADAPT, achieving an average improvement of 8.2%. Extensive analyses further show that STEER2ADAPT is a data-efficient, stable, and transparent inference-time adaptation method for LLMs.
Paper Structure (33 sections, 13 equations, 8 figures, 10 tables)

This paper contains 33 sections, 13 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Comparison of Task-Vector Steering, Semantic-Driven Vector Steering, and Steer2Adapt. (a) Task-Vector Steering derives task vectors through large-scale data training; while effective, this approach is computationally intensive and lacks semantic interpretability. (b) Concept-Vector Steering utilizes pre-defined semantic concept vectors, which often lack the necessary expressiveness for complex downstream tasks. (c) Steer2Adapt (ours) employs Bayesian Optimization with minimal examples to find an optimal linear combination of concept vectors, achieving high performance while remaining data-efficient and semantically transparent.
  • Figure 2: Steer2Adapt Overview. (1) Semantic prior subspace construction: based on human's insights, we define a set of concepts that will affect model performance in a domain and extract corresponding steering vectors to form a semantic prior subspace within LLMs activation space. (2) Composed vector search: using only a few task examples, we run Bayesian optimization over the subspace coefficients with a stability-aware objective that rewards fixing wrong predictions while penalizing flips from correct to incorrect, yielding a composed steering vector for inference-stage model steering.
  • Figure 3: Steer2Adapt delivers strong, consistent improvements across both reasoning and safety domains. Top row: reasoning results; bottom row: safety results. Left: Task generalization, measured by average percentage improvement over the baseline across models for each task. Middle: Model generalization, measured by average percentage improvement over the baseline across tasks for each backbone model. Right: Reliability and gain distribution, showing performance changes across all evaluation scenarios (reasoning: $5$ tasks $\times$$3$ models per method; safety: $4$ tasks $\times$$3$ models per method). Across both domains, Steer2Adapt achieves strong average gains while exhibiting compact, positively centered distributions, indicating robust and consistent performance.
  • Figure 4: Steer2Adapt depends on basis direction relevance and is robust to moderate subspace noise. (a) Steering reasoning with a mismatched subspace (safety directions) causes large performance drops and higher variance. (b) Adding a small number of less relevant directions to the reasoning subspace leads to only minor performance changes. (c) Task vectors from relevant tasks can form an effective steering subspace with performance comparable to semantic subspaces.
  • Figure 5: Steer2Adapt achieves the best performance--efficiency trade-off. We report an efficiency score that measures the gain in task performance per unit of inference cost, computed as $\text{Efficiency}=(\text{Improvement}-\text{Minimum Performance})/\text{Inference Overhead}$.
  • ...and 3 more figures