A Survey of Exploration Methods in Reinforcement Learning
Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup
TL;DR
The paper surveys exploration methods for sequential decision making in reinforcement learning, organizing approaches into reward-free versus reward-based categories and further distinguishing memory-free from memory-based strategies. It covers early random and intrinsic-motivation techniques, randomized action selection (value-based and policy-search), optimism/bonus-based methods (tabular and function-approximation), prediction-error bonuses, and deliberate Bayesian/meta-learning frameworks, including PSRL and BAMCP. The work highlights theoretical guarantees (PAC, UCB, BAMDP), practical performance, and the challenges of evaluating exploration methods across diverse tasks and benchmarks. By compiling a comprehensive taxonomy and representative methods, it provides practitioners with a structured entry point to select exploration strategies aligned with problem structure and computational resources.
Abstract
Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.
