Table of Contents
Fetching ...

Efficient Reasoning Models: A Survey

Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang

TL;DR

This survey analyzes the efficiency challenges in reasoning-enabled large language models and organizes existing work into three axes: shortening long chain-of-thoughts, building compact models with strong reasoning, and speeding up decoding. It surveys a broad spectrum of methods, including RL-based length control, variable-length CoT data with SFT, prompt-driven routing, latent and implicit reasoning, distillation and quantization for smaller models, and advanced decoding strategies. The paper also covers evaluation metrics, datasets, and benchmarks, while discussing safety, multimodal extensions, and sustainability as key future directions. Overall, it outlines a comprehensive framework for designing and evaluating efficient reasoning systems with practical relevance for real-time and resource-constrained applications.

Abstract

Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer. Yet, the emergence of this "slow-thinking" paradigm, with numerous tokens generated in sequence, inevitably introduces substantial computational overhead. To this end, it highlights an urgent need for effective acceleration. This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions: (1) shorter - compressing lengthy CoTs into concise yet effective reasoning chains; (2) smaller - developing compact language models with strong reasoning capabilities through techniques such as knowledge distillation, other model compression techniques, and reinforcement learning; and (3) faster - designing efficient decoding strategies to accelerate inference of reasoning models. A curated collection of papers discussed in this survey is available in our GitHub repository: https://github.com/fscdc/Awesome-Efficient-Reasoning-Models.

Efficient Reasoning Models: A Survey

TL;DR

This survey analyzes the efficiency challenges in reasoning-enabled large language models and organizes existing work into three axes: shortening long chain-of-thoughts, building compact models with strong reasoning, and speeding up decoding. It surveys a broad spectrum of methods, including RL-based length control, variable-length CoT data with SFT, prompt-driven routing, latent and implicit reasoning, distillation and quantization for smaller models, and advanced decoding strategies. The paper also covers evaluation metrics, datasets, and benchmarks, while discussing safety, multimodal extensions, and sustainability as key future directions. Overall, it outlines a comprehensive framework for designing and evaluating efficient reasoning systems with practical relevance for real-time and resource-constrained applications.

Abstract

Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer. Yet, the emergence of this "slow-thinking" paradigm, with numerous tokens generated in sequence, inevitably introduces substantial computational overhead. To this end, it highlights an urgent need for effective acceleration. This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions: (1) shorter - compressing lengthy CoTs into concise yet effective reasoning chains; (2) smaller - developing compact language models with strong reasoning capabilities through techniques such as knowledge distillation, other model compression techniques, and reinforcement learning; and (3) faster - designing efficient decoding strategies to accelerate inference of reasoning models. A curated collection of papers discussed in this survey is available in our GitHub repository: https://github.com/fscdc/Awesome-Efficient-Reasoning-Models.

Paper Structure

This paper contains 71 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of efficient reasoning. We categorize existing efficient reasoning methods into three key directions based on how they improve reasoning efficiency: (1) make long CoT short (shorter); (2) build small language models with strong reasoning ability (smaller); and (3) let decoding more efficient (faster).
  • Figure 2: Taxonomy of efficient reasoning.
  • Figure 3: Motivation for efficient reasoning. (Left) Models often exhibit overthinking, generating unnecessarily long reasoning chains even for simple tasks. (Middle) Longer reasoning is not always better and may result in reduced accuracy when excessively verbose. (Right) Lengthy reasoning increases computational costs and poses safety risks. In addition, improving efficiency helps alleviate resource constraints and lower costs.