Reasoning Beyond Limits: Advances and Open Problems for LLMs
Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah
TL;DR
This survey addresses the problem of enabling robust, transparent reasoning in large language models by synthesizing breakthroughs from 2023–2025. It catalogs top 27 LRMs and surveys training methodologies including MoE, RAG, CoT, RL-based alignment, and distillation, while detailing test-time compute strategies. Key contributions include a structured taxonomy of methods, an open panorama of model architectures, and a discussion of open challenges such as multi-step reasoning without human supervision, long-context retrieval, and cross-lingual alignment. The findings highlight a trend toward scalable reasoning through Mixture-of-Experts, retrieval-augmented reasoning, and self-improvement loops, with strong implications for deploying efficient, domain-specific agents. Together, these insights guide future design of cognitively capable, resource-efficient LLMs that can operate reliably across diverse tasks and modalities.
Abstract
Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.
