Table of Contents
Fetching ...

Efficient Reasoning via Thought-Training and Thought-Free Inference

Canhui Wu, Qiong Cao, Chao Xue, Wei Xi, Xiaodong He

TL;DR

3TF introduces an asymmetric Short-to-Long framework that trains LLMs on explicit chain-of-thought to internalize reasoning while enforcing thought-free final outputs at inference. By decoupling reasoning during training from concise final-output inference, it preserves high reasoning quality with substantially shorter responses, and scales effectively across model sizes. Empirical results on GSM8K, MATH500, Olympiad, and AIME24 demonstrate strong accuracy–efficiency trade-offs, with larger models approaching full-CoT performance even in No-Think mode. This work offers a practical path to deployment of reasoning-capable LLMs with reduced inference overhead and token usage.

Abstract

Recent advances in large language models (LLMs) have leveraged explicit Chain-of-Thought (CoT) prompting to improve reasoning accuracy. However, most existing methods primarily focus on compressing verbose reasoning outputs. These Long-to-Short transformations aim to improve efficiency, but require a large amount of short CoT data. In this work, we introduce \textbf{3TF} (\textbf{T}hought-\textbf{T}raining and \textbf{T}hought-\textbf{F}ree inference), a framework for efficient reasoning that takes a Short-to-Long perspective. We first train a hybrid model that can operate in both reasoning and non-reasoning modes, and then further train it on CoT-annotated data to internalize structured reasoning, while enforcing concise, thought-free outputs at inference time using the no-reasoning mode. Unlike compression-based approaches, 3TF improves the reasoning quality of non-reasoning outputs, enabling models to perform rich internal reasoning implicitly while keeping external outputs short. Empirically, 3TF-trained models obtain large improvements on reasoning benchmarks under thought-free inference, demonstrating that high quality reasoning can be learned and executed implicitly without explicit step-by-step generation.

Efficient Reasoning via Thought-Training and Thought-Free Inference

TL;DR

3TF introduces an asymmetric Short-to-Long framework that trains LLMs on explicit chain-of-thought to internalize reasoning while enforcing thought-free final outputs at inference. By decoupling reasoning during training from concise final-output inference, it preserves high reasoning quality with substantially shorter responses, and scales effectively across model sizes. Empirical results on GSM8K, MATH500, Olympiad, and AIME24 demonstrate strong accuracy–efficiency trade-offs, with larger models approaching full-CoT performance even in No-Think mode. This work offers a practical path to deployment of reasoning-capable LLMs with reduced inference overhead and token usage.

Abstract

Recent advances in large language models (LLMs) have leveraged explicit Chain-of-Thought (CoT) prompting to improve reasoning accuracy. However, most existing methods primarily focus on compressing verbose reasoning outputs. These Long-to-Short transformations aim to improve efficiency, but require a large amount of short CoT data. In this work, we introduce \textbf{3TF} (\textbf{T}hought-\textbf{T}raining and \textbf{T}hought-\textbf{F}ree inference), a framework for efficient reasoning that takes a Short-to-Long perspective. We first train a hybrid model that can operate in both reasoning and non-reasoning modes, and then further train it on CoT-annotated data to internalize structured reasoning, while enforcing concise, thought-free outputs at inference time using the no-reasoning mode. Unlike compression-based approaches, 3TF improves the reasoning quality of non-reasoning outputs, enabling models to perform rich internal reasoning implicitly while keeping external outputs short. Empirically, 3TF-trained models obtain large improvements on reasoning benchmarks under thought-free inference, demonstrating that high quality reasoning can be learned and executed implicitly without explicit step-by-step generation.

Paper Structure

This paper contains 28 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Overview of the 3TF Asymmetric Reasoning Paradigm. (Top) Training and inference templates. (Bottom) Flowchart: A Base model is trained on both Think and No-Think data, then fine-tuned on Think data. For No-Think Inference, the model uses the Inference Template to generate only the final answer.
  • Figure 2: Accuracy scaling of 3TF from 4B to 32B across four reasoning benchmarks. As model size increases, 3TF progressively closes the gap to the Base-Think mode, showing improved reasoning recoverability at scale.
  • Figure 3: Avg@k performance scaling of 3TF from 4B to 32B across four reasoning benchmarks. Similar to the pass@k accuracy results, 3TF progressively closes the gap to the Base-Think mode, showing that reasoning recoverability for this metric also improves with scale.
  • Figure 4: Impact of mixing No-Think data during training on Qwen3-8B across four reasoning benchmarks. Increasing the No-Think ratio consistently shortens model outputs (orange) but monotonically reduces accuracy (blue). Even 1% No-Think materially hurts reasoning fidelity, indicating high sensitivity to non-reasoning supervision.