A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Zixuan Ke; Fangkai Jiao; Yifei Ming; Xuan-Phi Nguyen; Austin Xu; Do Xuan Long; Minzhi Li; Chengwei Qin; Peifeng Wang; Silvio Savarese; Caiming Xiong; Shafiq Joty

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty

TL;DR

<3-5 sentence high-level summary> This survey analyzes reasoning in large language models along two orthogonal dimensions: reasoning regime (inference-time scaling vs learning-to-reason) and architecture (standalone LLMs, single-agent, and multi-agent systems). It presents a unifying input/output framework and catalogues key components (reasoners, verifiers, refiners) and common workflows (generator-evaluator, debates), while situating major milestones such as o1 and DeepSeek-R1. The authors outline learning algorithms (SFT, PPO, DPO, GRPO) and verifier designs (ORM/PRM/generative verifiers), and discuss emerging trends toward cost-aware and inference-aware training, as well as domain-specific reasoners and automated multi-agent design. The work highlights open challenges in evaluating and understanding reasoning, data efficiency, and the integration of agentic workflows, providing a foundation for designing scalable, interactive reasoning systems.</p>

Abstract

Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. ...

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

TL;DR

Abstract

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)