Steering LLM Reasoning Through Bias-Only Adaptation

Viacheslav Sinii; Alexey Gorbatovski; Artem Cherepanov; Boris Shaposhnikov; Nikita Balagansky; Daniil Gavrilov

Steering LLM Reasoning Through Bias-Only Adaptation

Viacheslav Sinii, Alexey Gorbatovski, Artem Cherepanov, Boris Shaposhnikov, Nikita Balagansky, Daniil Gavrilov

TL;DR

The paper addresses the high cost of fine-tuning large language models for mathematical reasoning by introducing per-layer steering vectors trained with reinforcement learning while freezing the base weights. This approach achieves parity with fully RL-tuned reasoning across multiple base models and benchmarks, using only $1.6\times 10^{-5}$ of the parameter budget, and yields substantial savings in optimizer memory and inter-GPU communication. A logit-lens analysis reveals that the learned vectors amplify coherent token directions, aiding interpretability of the model's internal reasoning. Collectively, the work demonstrates that minimal, interpretable adaptations can unlock high-level reasoning with dramatically reduced resource requirements, challenging the need for large adapter networks.

Abstract

We show that training a single $d$-dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only $\approx 0.0016\%$ additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.

Steering LLM Reasoning Through Bias-Only Adaptation

TL;DR

of the parameter budget, and yields substantial savings in optimizer memory and inter-GPU communication. A logit-lens analysis reveals that the learned vectors amplify coherent token directions, aiding interpretability of the model's internal reasoning. Collectively, the work demonstrates that minimal, interpretable adaptations can unlock high-level reasoning with dramatically reduced resource requirements, challenging the need for large adapter networks.

Abstract

We show that training a single

-dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only

additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.

Steering LLM Reasoning Through Bias-Only Adaptation

TL;DR

Abstract

Steering LLM Reasoning Through Bias-Only Adaptation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)