Table of Contents
Fetching ...

OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

Shengjia Zhang, Junjie Wu, Jiawei Chen, Changwang Zhang, Zhe Li, Xingyu Lou, Wangchunshu Zhou, Sheng Zhou, Can Wang, Jun Wang

TL;DR

OThink-R1 presents a hybrid reasoning framework that enables large reasoning models to automatically switch between fast, direct answers and slow, step-by-step reasoning. It identifies patterns distinguishing essential from redundant reasoning via an auxiliary judge and builds a hybrid fine-tuning dataset, trained with a dual KL-divergence objective to balance efficiency and correctness. Across QA and mathematical benchmarks, it achieves substantial token-efficiency gains while maintaining competitive accuracy, demonstrating practical benefits for scalable reasoning. The approach advances adaptive thinking in LRMs and highlights a path toward efficient, context-aware reasoning in AI systems.

Abstract

Human cognition operates through two complementary modes: fast intuitive thinking and slow deliberate thinking. Vanilla large language models (LLMs) predominantly follow the fast-thinking paradigm, producing immediate responses; while recent large reasoning models (LRMs) adopt slow-thinking strategies, generating detailed reasoning chains before arriving at answers. While LRMs often achieve higher accuracy, this comes at the cost of substantially increased token usage. To address this efficiency-accuracy trade-off, we propose OThink-R1, a hybrid reasoning framework that integrates both modes within a single LRM and enables automatic mode switching based on problem characteristics. We first identify three major patterns of essential and redundant reasoning trajectories in LRMs, which guide the design of an auxiliary LLM-based judge that adaptively determines when slow thinking is necessary. Leveraging the judge's decisions, we construct a hybrid fine-tuning dataset by pruning redundant reasoning to produce fast-thinking samples and retaining complete reasoning for slow-thinking samples. This dataset is then used to fine-tune LRMs, equipping them with inherent autonomous mode-selection capabilities. Extensive experiments on mathematical and question-answering benchmarks show that OThink-R1 reduces reasoning token usage significantly while maintaining competitive accuracy. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.

OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation

TL;DR

OThink-R1 presents a hybrid reasoning framework that enables large reasoning models to automatically switch between fast, direct answers and slow, step-by-step reasoning. It identifies patterns distinguishing essential from redundant reasoning via an auxiliary judge and builds a hybrid fine-tuning dataset, trained with a dual KL-divergence objective to balance efficiency and correctness. Across QA and mathematical benchmarks, it achieves substantial token-efficiency gains while maintaining competitive accuracy, demonstrating practical benefits for scalable reasoning. The approach advances adaptive thinking in LRMs and highlights a path toward efficient, context-aware reasoning in AI systems.

Abstract

Human cognition operates through two complementary modes: fast intuitive thinking and slow deliberate thinking. Vanilla large language models (LLMs) predominantly follow the fast-thinking paradigm, producing immediate responses; while recent large reasoning models (LRMs) adopt slow-thinking strategies, generating detailed reasoning chains before arriving at answers. While LRMs often achieve higher accuracy, this comes at the cost of substantially increased token usage. To address this efficiency-accuracy trade-off, we propose OThink-R1, a hybrid reasoning framework that integrates both modes within a single LRM and enables automatic mode switching based on problem characteristics. We first identify three major patterns of essential and redundant reasoning trajectories in LRMs, which guide the design of an auxiliary LLM-based judge that adaptively determines when slow thinking is necessary. Leveraging the judge's decisions, we construct a hybrid fine-tuning dataset by pruning redundant reasoning to produce fast-thinking samples and retaining complete reasoning for slow-thinking samples. This dataset is then used to fine-tune LRMs, equipping them with inherent autonomous mode-selection capabilities. Extensive experiments on mathematical and question-answering benchmarks show that OThink-R1 reduces reasoning token usage significantly while maintaining competitive accuracy. The code is available at https://github.com/AgenticIR-Lab/OThink-R1.

Paper Structure

This paper contains 28 sections, 3 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Illustration of the proposed OThink-R1 that equips LRMs with the adaptive hybrid reasoning ability. The pipeline consists of two main steps: ❶ Thinking Paradigm Identification. Distinctive patterns differentiating essential from redundant reasoning are extracted from LRM trajectories and organized as prompts to guide an special LLM to act as a judge in classifying reasoning trajectories. ❷ Fine-tune with Hybrid Thinking Dataset. The hybrid dataset is constructed by removing redundant reasoning trajectories to form fast-thinking samples and preserving essential ones as slow-thinking samples. The model is then fine-tuned on this dataset with dual KL-divergence constraint.
  • Figure 2: Comparison of DeepSeek-R1-Distill-Qwen-7B/OThink-R1-7B generated responses on CommonsenseQA.
  • Figure 3: Repeated Self-Validation on GSM8K.
  • Figure 4: Defensive Assumptions on GSM8K.
  • Figure 5: Multi-Solution Exploration on GSM8K.
  • ...and 4 more figures