A Comprehensive Survey of Reinforcement Learning: From Algorithms to Practical Challenges
Majid Ghasemi, Amir Hossein Moosavi, Dariush Ebrahimi
TL;DR
This survey traces reinforcement learning from classical DP and Monte Carlo methods to modern deep RL, analyzing strengths, weaknesses, and practical considerations across domains. It classifies algorithms into value-based, policy-based, and actor-critic families, and contrasts model-free versus model-based approaches with attention to convergence and stability. The work highlights practical guidance for algorithm selection, including planning methods like MCTS and Dyna-Q alongside DRL architectures such as DQN, PPO, A3C, and TD3, and discusses challenges in scalability, sample efficiency, and exploration. Overall, the paper provides a structured, domain-aware reference with summaries and tables to assist researchers and practitioners in applying RL to real-world problems while acknowledging that there is no one-size-fits-all solution.
Abstract
Reinforcement Learning (RL) has emerged as a powerful paradigm in Artificial Intelligence (AI), enabling agents to learn optimal behaviors through interactions with their environments. Drawing from the foundations of trial and error, RL equips agents to make informed decisions through feedback in the form of rewards or penalties. This paper presents a comprehensive survey of RL, meticulously analyzing a wide range of algorithms, from foundational tabular methods to advanced Deep Reinforcement Learning (DRL) techniques. We categorize and evaluate these algorithms based on key criteria such as scalability, sample efficiency, and suitability. We compare the methods in the form of their strengths and weaknesses in diverse settings. Additionally, we offer practical insights into the selection and implementation of RL algorithms, addressing common challenges like convergence, stability, and the exploration-exploitation dilemma. This paper serves as a comprehensive reference for researchers and practitioners aiming to harness the full potential of RL in solving complex, real-world problems.
