Steering Large Reasoning Models towards Concise Reasoning via Flow Matching
Yawei Li, Benjamin Bergner, Yinghan Zhao, Vihang Prakash Patil, Bei Chen, Cheng Wang
TL;DR
FlowSteer addresses the inefficiency of verbose reasoning in large reasoning models by reframing steering as nonlinear distribution transport via Flow Matching. It replaces the traditional linear steering vector with a learned velocity field that maps the verbose activation distribution to the concise one, enabling input-aware, on-manifold transformations. The approach introduces robust training (median-IQR normalization, Huber loss, OT coupling) and a probabilistic guidance mechanism to escape low-velocity regions, achieving superior distributional alignment and improved accuracy-token trade-offs across multiple benchmarks and model scales. Empirically, FlowSteer delivers up to 6.0% absolute accuracy gains and up to 14.5% token reductions in optimal settings, while maintaining minimal parameter overhead and competitive latency, illustrating a principled pathway to efficient, concise reasoning in LRMs.
Abstract
Large Reasoning Models (LRMs) excel at complex reasoning tasks, but their efficiency is often hampered by overly verbose outputs. Prior steering methods attempt to address this issue by applying a single, global vector to hidden representations -- an approach grounded in the restrictive linear representation hypothesis. In this work, we introduce FlowSteer, a nonlinear steering method that goes beyond uniform linear shifts by learning a complete transformation between the distributions associated with verbose and concise reasoning. This transformation is learned via Flow Matching as a velocity field, enabling precise, input-dependent control over the model's reasoning process. By aligning steered representations with the distribution of concise-reasoning activations, FlowSteer yields more compact reasoning than the linear shifts. Across diverse reasoning benchmarks, FlowSteer demonstrates strong task performance and token efficiency compared to leading inference-time baselines. Our work demonstrates that modeling the full distributional transport with generative techniques offers a more effective and principled foundation for controlling LRMs.
