Controlling Thinking Speed in Reasoning Models
Zhengkai Lin, Zhihang Fu, Ze Chen, Chao Chen, Liang Xie, Wenxiao Wang, Deng Cai, Zheng Wang, Jieping Ye
TL;DR
This work tackles the challenge of balancing speed and accuracy in Large Reasoning Models by introducing dynamic thinking speed control. It first reveals an intrinsic fast/slow thinking switch in LRMs and then derives a PCA-based steering vector from representation differences to modulate reasoning during inference. The paper then pairs this representation-editing approach with an adaptive, difficulty-aware mechanism that uses logit-based signals to adjust thinking speed in real time, achieving improved accuracy and efficiency across multiple models and benchmarks. Collectively, the methods provide a plug-in framework for faster, simpler reasoning on easy tasks and more thorough, correct analyses for complex problems, with broad implications for scalable AI reasoning systems.
Abstract
Human cognition is theorized to operate in two modes: fast, intuitive System 1 thinking and slow, deliberate System 2 thinking. While current Large Reasoning Models (LRMs) excel at System 2 thinking, their inability to perform fast thinking leads to high computational overhead and latency. In this work, we enable LRMs to approximate human intelligence through dynamic thinking speed adjustment, optimizing accuracy-efficiency trade-offs. Our approach addresses two key questions: (1) how to control thinking speed in LRMs, and (2) when to adjust it for optimal performance. For the first question, we identify the steering vector that governs slow-fast thinking transitions in LRMs' representation space. Using this vector, we achieve the first representation editing-based test-time scaling effect, outperforming existing prompt-based scaling methods. For the second question, we apply real-time difficulty estimation to signal reasoning segments of varying complexity. Combining these techniques, we propose the first reasoning strategy that enables fast processing of easy steps and deeper analysis for complex reasoning. Without any training or additional cost, our plug-in module delivers an average +1.3% accuracy with -8.6% token usage across leading LRMs and advanced reasoning benchmarks. All of our algorithms are implemented based on vLLM and are expected to support broader applications and inspire future research.
