Table of Contents
Fetching ...

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty

TL;DR

<3-5 sentence high-level summary> This survey analyzes reasoning in large language models along two orthogonal dimensions: reasoning regime (inference-time scaling vs learning-to-reason) and architecture (standalone LLMs, single-agent, and multi-agent systems). It presents a unifying input/output framework and catalogues key components (reasoners, verifiers, refiners) and common workflows (generator-evaluator, debates), while situating major milestones such as o1 and DeepSeek-R1. The authors outline learning algorithms (SFT, PPO, DPO, GRPO) and verifier designs (ORM/PRM/generative verifiers), and discuss emerging trends toward cost-aware and inference-aware training, as well as domain-specific reasoners and automated multi-agent design. The work highlights open challenges in evaluating and understanding reasoning, data efficiency, and the integration of agentic workflows, providing a foundation for designing scalable, interactive reasoning systems.</p>

Abstract

Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. ...

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

TL;DR

<3-5 sentence high-level summary> This survey analyzes reasoning in large language models along two orthogonal dimensions: reasoning regime (inference-time scaling vs learning-to-reason) and architecture (standalone LLMs, single-agent, and multi-agent systems). It presents a unifying input/output framework and catalogues key components (reasoners, verifiers, refiners) and common workflows (generator-evaluator, debates), while situating major milestones such as o1 and DeepSeek-R1. The authors outline learning algorithms (SFT, PPO, DPO, GRPO) and verifier designs (ORM/PRM/generative verifiers), and discuss emerging trends toward cost-aware and inference-aware training, as well as domain-specific reasoners and automated multi-agent design. The work highlights open challenges in evaluating and understanding reasoning, data efficiency, and the integration of agentic workflows, providing a foundation for designing scalable, interactive reasoning systems.</p>

Abstract

Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, which define the stage at which reasoning is achieved (either at inference time or through dedicated training); and (2) Architectures, which determine the components involved in the reasoning process, distinguishing between standalone LLMs and agentic compound systems that incorporate external tools, and multi-agent collaborations. Within each dimension, we analyze two key perspectives: (1) Input level, which focuses on techniques that construct high-quality prompts that the LLM condition on; and (2) Output level, which methods that refine multiple sampled candidates to enhance reasoning quality. This categorization provides a systematic understanding of the evolving landscape of LLM reasoning, highlighting emerging trends such as the shift from inference-scaling to learning-to-reason (e.g., DeepSeek-R1), and the transition to agentic workflows (e.g., OpenAI Deep Research, Manus Agent). Additionally, we cover a broad spectrum of learning algorithms, from supervised fine-tuning to reinforcement learning such as PPO and GRPO, and the training of reasoners and verifiers. We also examine key designs of agentic workflows, from established patterns like generator-evaluator and LLM debate to recent innovations. ...

Paper Structure

This paper contains 108 sections, 13 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 2: The proposed categorization over regimes, architectures, and unified perspectives in this survey.
  • Figure 3: Taxonomy of LLM reasoning research organized in this survey by regimes (inference scaling, learning to reason) and architectures (standalone LLM, single-agent, multi-agent). Each leaf node includes examples from the literature that focus on the corresponding category.
  • Figure 4: Three key components of a reasoning system. The Reasoner proposes new responses (usually accompanied with rationales) for a query. The Verifier takes as input a verification instruction (e.g., what aspects to evaluate) and the response(s) from the reasoner, then outputs a judgment on the response(s) (often in the form of a numeric score or relative order, and typically accompanied by a natural language critique or rationale for its judgment). The Refiner, unlike the first two, takes as input an incorrect response and optionally the critique (as provided by the verifier) and outputs a revised response.
  • Figure 5: Three architecture types used for designing a reasoning system in the context of LLMs. highlights perspectives that the literature emphasizes for customization.
  • Figure 6: Inference-time and training-time regimes of a reasoning system. We use tree search as an example to illustrate the inference scaling and trajectories collection. Given a query, inference scaling relies on extensive inference computation to improve the reasoner’s distribution. Specifically, it generates multiple candidate reasoning steps at each layer and selects the best solution to proceed (e.g., by using an external verifier or ensembling). In contrast, learning to reason focuses on collecting trajectories and training from the collected data with minimal inference-time computation. It takes all trajectories in the process (identical to those used in inference-scaling, allowing us to reuse the same tree) and labels them with preferences. The preference data can then be used to train the reasoner.