Table of Contents
Fetching ...

Steering LLM Reasoning Through Bias-Only Adaptation

Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov

TL;DR

The paper addresses the high cost of fine-tuning large language models for mathematical reasoning by introducing per-layer steering vectors trained with reinforcement learning while freezing the base weights. This approach achieves parity with fully RL-tuned reasoning across multiple base models and benchmarks, using only $1.6\times 10^{-5}$ of the parameter budget, and yields substantial savings in optimizer memory and inter-GPU communication. A logit-lens analysis reveals that the learned vectors amplify coherent token directions, aiding interpretability of the model's internal reasoning. Collectively, the work demonstrates that minimal, interpretable adaptations can unlock high-level reasoning with dramatically reduced resource requirements, challenging the need for large adapter networks.

Abstract

We show that training a single $d$-dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only $\approx 0.0016\%$ additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.

Steering LLM Reasoning Through Bias-Only Adaptation

TL;DR

The paper addresses the high cost of fine-tuning large language models for mathematical reasoning by introducing per-layer steering vectors trained with reinforcement learning while freezing the base weights. This approach achieves parity with fully RL-tuned reasoning across multiple base models and benchmarks, using only of the parameter budget, and yields substantial savings in optimizer memory and inter-GPU communication. A logit-lens analysis reveals that the learned vectors amplify coherent token directions, aiding interpretability of the model's internal reasoning. Collectively, the work demonstrates that minimal, interpretable adaptations can unlock high-level reasoning with dramatically reduced resource requirements, challenging the need for large adapter networks.

Abstract

We show that training a single -dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.

Paper Structure

This paper contains 21 sections, 4 equations, 1 figure, 6 tables.

Figures (1)

  • Figure 1: Layer-wise trainable steering vectors. All base transformer weights are frozen (blue). The only trainable parameters are one $d$-dimensional vector $v_\ell$ per layer (orange), added to the residual stream at every token position: $h_{\ell,t}\ \leftarrow\ h_{\ell,t} + v_\ell$.