Table of Contents
Fetching ...

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

Jie Cao, Tianwei Lin, Zhenxuan Fan, Bo Yuan, Ziyuan Zhao, Rolan Yan, Wenqiao Zhang, Siliang Tang

TL;DR

Draft-Thinking is proposed, which guides models to first learn a concise, draft-style reasoning structure that retains only the critical reasoning steps, and introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior.

Abstract

Long chain-of-thought~(CoT) has become a dominant paradigm for enhancing the reasoning capability of large reasoning models~(LRMs); however, the performance gains often come with a substantial increase in reasoning budget. Recent studies show that existing CoT paradigms tend to induce systematic overthinking, unnecessarily coupling reasoning capability with reasoning cost. Most prior approaches reduce token usage through post hoc techniques such as token compression, truncation, or length penalties, without explicitly addressing the core mechanisms of reasoning. We propose \textbf{Draft-Thinking}, which guides models to first learn a concise \textit{draft-style} reasoning structure that retains only the critical reasoning steps. Through a \textit{progressive curriculum learning}, the model stably internalizes this efficient reasoning pattern as its capability scales. Moreover, Draft-Thinking introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior. Extensive experiments demonstrate that Draft-Thinking substantially reduces reasoning budget while largely preserving reasoning performance; for example, on MATH500, it achieves an 82.6\% reduction in reasoning budget at the cost of only a 2.6\% performance drop.

Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs

TL;DR

Draft-Thinking is proposed, which guides models to first learn a concise, draft-style reasoning structure that retains only the critical reasoning steps, and introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior.

Abstract

Long chain-of-thought~(CoT) has become a dominant paradigm for enhancing the reasoning capability of large reasoning models~(LRMs); however, the performance gains often come with a substantial increase in reasoning budget. Recent studies show that existing CoT paradigms tend to induce systematic overthinking, unnecessarily coupling reasoning capability with reasoning cost. Most prior approaches reduce token usage through post hoc techniques such as token compression, truncation, or length penalties, without explicitly addressing the core mechanisms of reasoning. We propose \textbf{Draft-Thinking}, which guides models to first learn a concise \textit{draft-style} reasoning structure that retains only the critical reasoning steps. Through a \textit{progressive curriculum learning}, the model stably internalizes this efficient reasoning pattern as its capability scales. Moreover, Draft-Thinking introduces adaptive prompting, which elevates reasoning depth to a flexible, model-selectable behavior. Extensive experiments demonstrate that Draft-Thinking substantially reduces reasoning budget while largely preserving reasoning performance; for example, on MATH500, it achieves an 82.6\% reduction in reasoning budget at the cost of only a 2.6\% performance drop.
Paper Structure (32 sections, 2 equations, 12 figures, 6 tables)

This paper contains 32 sections, 2 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Illustration of the student learning process. Students distill core knowledge from teachers into concise drafts, progressively refine it through practice, and ultimately achieve mastery.
  • Figure 2: Accuracy and token count comparison on the MATH500 dataset. Draft-Thinking achieves comparable or better accuracy with a smaller budget.
  • Figure 3: Example resoning trace of original Qwen3-8B and Draft thinking on a MATH500 question.
  • Figure 4: Reasoning behavior comparison between original Qwen3-8B and Draft thinking on MATH500. Each bar represents the cumulative number of reasoning steps within a phase category.
  • Figure 5: Comparison of average accuracy and response length across different difficulty levels on MATH500.
  • ...and 7 more figures