Reinforcement Learning: An Overview
Kevin Murphy
TL;DR
This survey synthesizes the major strands of reinforcement learning, outlining how value-based, policy-based, and model-based approaches address the core problem of learning controllers for sequential decision making. It highlights the interplay between planning, learning from data, and dealing with partial observability and model uncertainty, while detailing key algorithms (e.g., Q-learning, DQN, PPO, SAC, MCTS, MuZero, Dreamer) and their theoretical underpinnings. A central theme is improving sample efficiency through world models and offline or off-policy methods, with extensive discussion of practical considerations, reward design, and experimental best practices. The work connects RL to inference, control theory, and game-theoretic MB-RL frameworks, and surveys both foundational ideas and cutting-edge variants that enable scalable, robust decision making in complex environments.
Abstract
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based methods, policy-based methods, model-based methods, multi-agent RL, LLMs and RL, and various other topics (e.g., offline RL, hierarchical RL, intrinsic reward). It also includes some code snippets for training LLMs with RL.
