MixReasoning: Switching Modes to Think
Haiquan Lu, Gongfan Fang, Xinyin Ma, Qi Li, Xinchao Wang
TL;DR
MixReasoning addresses the inefficiency of uniformly long chain-of-thought reasoning by adaptively adjusting reasoning depth within a single model response. It uses a lightweight LoRA adapter to switch between concise and detailed thinking at locally uncertain points, guided by token-level entropy without retraining or coordinating multiple models. Empirical results across GSM8K, MATH-500, and AIME demonstrate reduced reasoning length and improved efficiency while maintaining or improving accuracy, with a controllable budget via window size and uncertainty thresholds. The approach preserves KV-cache reuse and preserves base-model capabilities, offering a practical, memory-friendly path to more readable and cost-efficient reasoning in interactive settings.
Abstract
Reasoning models enhance performance by tackling problems in a step-by-step manner, decomposing them into sub-problems and exploring long chains of thought before producing an answer. However, applying extended reasoning to every step introduces substantial redundancy, as sub-problems vary widely in difficulty and complexity: a small number of pivotal steps are genuinely challenging and decisive for the final answer, while many others only involve straightforward revisions or simple computations. Therefore, a natural idea is to endow reasoning models with the ability to adaptively respond to this variation, rather than treating all steps with the same level of elaboration. To this end, we propose MixReasoning, a framework that dynamically adjusts the depth of reasoning within a single response. The resulting chain of thought then becomes a mixture of detailed reasoning on difficult steps and concise inference on simpler ones. Experiments on GSM8K, MATH-500, and AIME show that MixReasoning shortens reasoning length and substantially improves efficiency without compromising accuracy.
