Table of Contents
Fetching ...

A Survey of Exploration Methods in Reinforcement Learning

Susan Amin, Maziar Gomrokchi, Harsh Satija, Herke van Hoof, Doina Precup

TL;DR

The paper surveys exploration methods for sequential decision making in reinforcement learning, organizing approaches into reward-free versus reward-based categories and further distinguishing memory-free from memory-based strategies. It covers early random and intrinsic-motivation techniques, randomized action selection (value-based and policy-search), optimism/bonus-based methods (tabular and function-approximation), prediction-error bonuses, and deliberate Bayesian/meta-learning frameworks, including PSRL and BAMCP. The work highlights theoretical guarantees (PAC, UCB, BAMDP), practical performance, and the challenges of evaluating exploration methods across diverse tasks and benchmarks. By compiling a comprehensive taxonomy and representative methods, it provides practitioners with a structured entry point to select exploration strategies aligned with problem structure and computational resources.

Abstract

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.

A Survey of Exploration Methods in Reinforcement Learning

TL;DR

The paper surveys exploration methods for sequential decision making in reinforcement learning, organizing approaches into reward-free versus reward-based categories and further distinguishing memory-free from memory-based strategies. It covers early random and intrinsic-motivation techniques, randomized action selection (value-based and policy-search), optimism/bonus-based methods (tabular and function-approximation), prediction-error bonuses, and deliberate Bayesian/meta-learning frameworks, including PSRL and BAMCP. The work highlights theoretical guarantees (PAC, UCB, BAMDP), practical performance, and the challenges of evaluating exploration methods across diverse tasks and benchmarks. By compiling a comprehensive taxonomy and representative methods, it provides practitioners with a structured entry point to select exploration strategies aligned with problem structure and computational resources.

Abstract

Exploration is an essential component of reinforcement learning algorithms, where agents need to learn how to predict and control unknown and often stochastic environments. Reinforcement learning agents depend crucially on exploration to obtain informative data for the learning process as the lack of enough information could hinder effective learning. In this article, we provide a survey of modern exploration methods in (Sequential) reinforcement learning, as well as a taxonomy of exploration methods.

Paper Structure

This paper contains 38 sections, 34 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Exploration Categories- The exploration methods are categorized into two main groups reward-free and reward-based exploration techniques, depending on their utilization of extrinsic rewards. Each group is further divided to memory-based and memory-free categories based on the reliance of the exploratory decisions on the agent's memory of the observed space.
  • Figure 2: A 2-state MDP with uncertain transition probabilities under (a) action 1 and (b) action 2. Rewards are denoted by $\pm 1$ in the states.