Table of Contents
Fetching ...

Reasoning Beyond Limits: Advances and Open Problems for LLMs

Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah

TL;DR

This survey addresses the problem of enabling robust, transparent reasoning in large language models by synthesizing breakthroughs from 2023–2025. It catalogs top 27 LRMs and surveys training methodologies including MoE, RAG, CoT, RL-based alignment, and distillation, while detailing test-time compute strategies. Key contributions include a structured taxonomy of methods, an open panorama of model architectures, and a discussion of open challenges such as multi-step reasoning without human supervision, long-context retrieval, and cross-lingual alignment. The findings highlight a trend toward scalable reasoning through Mixture-of-Experts, retrieval-augmented reasoning, and self-improvement loops, with strong implications for deploying efficient, domain-specific agents. Together, these insights guide future design of cognitively capable, resource-efficient LLMs that can operate reliably across diverse tasks and modalities.

Abstract

Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.

Reasoning Beyond Limits: Advances and Open Problems for LLMs

TL;DR

This survey addresses the problem of enabling robust, transparent reasoning in large language models by synthesizing breakthroughs from 2023–2025. It catalogs top 27 LRMs and surveys training methodologies including MoE, RAG, CoT, RL-based alignment, and distillation, while detailing test-time compute strategies. Key contributions include a structured taxonomy of methods, an open panorama of model architectures, and a discussion of open challenges such as multi-step reasoning without human supervision, long-context retrieval, and cross-lingual alignment. The findings highlight a trend toward scalable reasoning through Mixture-of-Experts, retrieval-augmented reasoning, and self-improvement loops, with strong implications for deploying efficient, domain-specific agents. Together, these insights guide future design of cognitively capable, resource-efficient LLMs that can operate reliably across diverse tasks and modalities.

Abstract

Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.

Paper Structure

This paper contains 113 sections, 32 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Survey Structure.
  • Figure 2: Thought Preference Optimization wu2024thinking – The process begins by instructing the language model to produce an internal reasoning sequence before generating its final answer. After multiple potential responses are sampled, these answers are passed to an evaluator model that identifies the most and least preferred options. The complete outputs corresponding to these top and bottom choices serve as accepted and rejected examples for DPO optimization rafailov2023direct. This cycle is repeated over multiple training iterations.
  • Figure 3: Direct Preference Optimization (DPO) rafailov2023direct – DPO aligns models with human preferences without employing a reinforcement learning stage. Traditional approaches for fine-tuning language models with human feedback christiano2017deep typically involve training a reward model on a dataset of prompts and preference annotations, then using RL to discover a policy that maximizes the learned reward. By contrast, DPO directly adapts the policy to fulfill these preferences best using a straightforward classification-based objective, avoiding both explicit reward modeling and RL.