Efficient Reasoning Models: A Survey
Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang
TL;DR
This survey analyzes the efficiency challenges in reasoning-enabled large language models and organizes existing work into three axes: shortening long chain-of-thoughts, building compact models with strong reasoning, and speeding up decoding. It surveys a broad spectrum of methods, including RL-based length control, variable-length CoT data with SFT, prompt-driven routing, latent and implicit reasoning, distillation and quantization for smaller models, and advanced decoding strategies. The paper also covers evaluation metrics, datasets, and benchmarks, while discussing safety, multimodal extensions, and sustainability as key future directions. Overall, it outlines a comprehensive framework for designing and evaluating efficient reasoning systems with practical relevance for real-time and resource-constrained applications.
Abstract
Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer. Yet, the emergence of this "slow-thinking" paradigm, with numerous tokens generated in sequence, inevitably introduces substantial computational overhead. To this end, it highlights an urgent need for effective acceleration. This survey aims to provide a comprehensive overview of recent advances in efficient reasoning. It categorizes existing works into three key directions: (1) shorter - compressing lengthy CoTs into concise yet effective reasoning chains; (2) smaller - developing compact language models with strong reasoning capabilities through techniques such as knowledge distillation, other model compression techniques, and reinforcement learning; and (3) faster - designing efficient decoding strategies to accelerate inference of reasoning models. A curated collection of papers discussed in this survey is available in our GitHub repository: https://github.com/fscdc/Awesome-Efficient-Reasoning-Models.
