Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

Yawei Li; Benjamin Bergner; Yinghan Zhao; Vihang Prakash Patil; Bei Chen; Cheng Wang

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

Yawei Li, Benjamin Bergner, Yinghan Zhao, Vihang Prakash Patil, Bei Chen, Cheng Wang

TL;DR

FlowSteer addresses the inefficiency of verbose reasoning in large reasoning models by reframing steering as nonlinear distribution transport via Flow Matching. It replaces the traditional linear steering vector with a learned velocity field that maps the verbose activation distribution to the concise one, enabling input-aware, on-manifold transformations. The approach introduces robust training (median-IQR normalization, Huber loss, OT coupling) and a probabilistic guidance mechanism to escape low-velocity regions, achieving superior distributional alignment and improved accuracy-token trade-offs across multiple benchmarks and model scales. Empirically, FlowSteer delivers up to 6.0% absolute accuracy gains and up to 14.5% token reductions in optimal settings, while maintaining minimal parameter overhead and competitive latency, illustrating a principled pathway to efficient, concise reasoning in LRMs.

Abstract

Large Reasoning Models (LRMs) excel at complex reasoning tasks, but their efficiency is often hampered by overly verbose outputs. Prior steering methods attempt to address this issue by applying a single, global vector to hidden representations -- an approach grounded in the restrictive linear representation hypothesis. In this work, we introduce FlowSteer, a nonlinear steering method that goes beyond uniform linear shifts by learning a complete transformation between the distributions associated with verbose and concise reasoning. This transformation is learned via Flow Matching as a velocity field, enabling precise, input-dependent control over the model's reasoning process. By aligning steered representations with the distribution of concise-reasoning activations, FlowSteer yields more compact reasoning than the linear shifts. Across diverse reasoning benchmarks, FlowSteer demonstrates strong task performance and token efficiency compared to leading inference-time baselines. Our work demonstrates that modeling the full distributional transport with generative techniques offers a more effective and principled foundation for controlling LRMs.

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

TL;DR

Abstract

Paper Structure (29 sections, 24 equations, 5 figures, 10 tables)

This paper contains 29 sections, 24 equations, 5 figures, 10 tables.

Introduction
Preliminaries
Reducing reasoning path length by linear steering
Preliminaries on Flow Matching
Methodology
Robust training strategy
Probabilistic guidance avoids stagnation in low-velocity zones
Experiments
Implementation
Alignment between steered and target distributions
Evaluation on mathematical and coding tasks
Ablation study
Analysis on space and time complexity
Related work
Conclusion
...and 14 more sections

Figures (5)

Figure 1: Left: The source distribution corresponds to hidden representations that produce verbose CoTs, while the target distribution corresponds to representations that produce concise CoTs. Better zoom in for clarity. Middle: Linear steering methods apply the same steering vector (the blue bolded arrow) to all source representations, aligning only the means of the two distributions. This ignores higher-order statistics such as covariance, often resulting in a substantial mismatch. Right: Our FlowSteer leverages Flow Matching to learn a mapping from the source distribution to the target distribution, naturally aligning the two due to the theoretical properties of Flow Matching.
Figure 2: Activations from layer 20 of DeepSeek-R1-Distill-Qwen-1.5B.
Figure 3: Average accuracy and token count aggregated across all benchmarks.
Figure 4: Average accuracy as a function of average token count for correct answers.
Figure 5: The line plots report accuracy, while the bar plots show the average token count across all samples. The dashed line and the rightmost bar correspond to the vanilla LRM baseline. For visual clarity, the bars are evenly spaced along the $x$-axis, although the underlying hyperparameter grid is uneven.

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

TL;DR

Abstract

Steering Large Reasoning Models towards Concise Reasoning via Flow Matching

Authors

TL;DR

Abstract

Table of Contents

Figures (5)