Table of Contents
Fetching ...

Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm

Nikolai Rozanov

TL;DR

The work investigates data-efficient exploration in deep reinforcement learning by marrying Bayesian uncertainty with actor-critic methods. It introduces a Bayesian Actor-Critic algorithm and a Frequentist Thompson DQN baseline, and evaluates them using bandits and Google's bsuite with an open-source BARL framework. The empirical results demonstrate improved exploration efficiency and faster convergence for Bayesian exploration and actor-critic approaches, particularly in long-horizon and sparse-reward tasks. The study provides a practical foundation for scalable, uncertainty-aware RL and outlines a clear path for future exploration in model-based extensions, curiosity-driven methods, and multi-agent settings.

Abstract

Reinforcement learning (RL) and Deep Reinforcement Learning (DRL), in particular, have the potential to disrupt and are already changing the way we interact with the world. One of the key indicators of their applicability is their ability to scale and work in real-world scenarios, that is in large-scale problems. This scale can be achieved via a combination of factors, the algorithm's ability to make use of large amounts of data and computational resources and the efficient exploration of the environment for viable solutions (i.e. policies). In this work, we investigate and motivate some theoretical foundations for deep reinforcement learning. We start with exact dynamic programming and work our way up to stochastic approximations and stochastic approximations for a model-free scenario, which forms the theoretical basis of modern reinforcement learning. We present an overview of this highly varied and rapidly changing field from the perspective of Approximate Dynamic Programming. We then focus our study on the short-comings with respect to exploration of the cornerstone approaches (i.e. DQN, DDQN, A2C) in deep reinforcement learning. On the theory side, our main contribution is the proposal of a novel Bayesian actor-critic algorithm. On the empirical side, we evaluate Bayesian exploration as well as actor-critic algorithms on standard benchmarks as well as state-of-the-art evaluation suites and show the benefits of both of these approaches over current state-of-the-art deep RL methods. We release all the implementations and provide a full python library that is easy to install and hopefully will serve the reinforcement learning community in a meaningful way, and provide a strong foundation for future work.

Efficient Exploration in Deep Reinforcement Learning: A Novel Bayesian Actor-Critic Algorithm

TL;DR

The work investigates data-efficient exploration in deep reinforcement learning by marrying Bayesian uncertainty with actor-critic methods. It introduces a Bayesian Actor-Critic algorithm and a Frequentist Thompson DQN baseline, and evaluates them using bandits and Google's bsuite with an open-source BARL framework. The empirical results demonstrate improved exploration efficiency and faster convergence for Bayesian exploration and actor-critic approaches, particularly in long-horizon and sparse-reward tasks. The study provides a practical foundation for scalable, uncertainty-aware RL and outlines a clear path for future exploration in model-based extensions, curiosity-driven methods, and multi-agent settings.

Abstract

Reinforcement learning (RL) and Deep Reinforcement Learning (DRL), in particular, have the potential to disrupt and are already changing the way we interact with the world. One of the key indicators of their applicability is their ability to scale and work in real-world scenarios, that is in large-scale problems. This scale can be achieved via a combination of factors, the algorithm's ability to make use of large amounts of data and computational resources and the efficient exploration of the environment for viable solutions (i.e. policies). In this work, we investigate and motivate some theoretical foundations for deep reinforcement learning. We start with exact dynamic programming and work our way up to stochastic approximations and stochastic approximations for a model-free scenario, which forms the theoretical basis of modern reinforcement learning. We present an overview of this highly varied and rapidly changing field from the perspective of Approximate Dynamic Programming. We then focus our study on the short-comings with respect to exploration of the cornerstone approaches (i.e. DQN, DDQN, A2C) in deep reinforcement learning. On the theory side, our main contribution is the proposal of a novel Bayesian actor-critic algorithm. On the empirical side, we evaluate Bayesian exploration as well as actor-critic algorithms on standard benchmarks as well as state-of-the-art evaluation suites and show the benefits of both of these approaches over current state-of-the-art deep RL methods. We release all the implementations and provide a full python library that is easy to install and hopefully will serve the reinforcement learning community in a meaningful way, and provide a strong foundation for future work.
Paper Structure (54 sections, 50 equations, 5 figures, 14 tables, 3 algorithms)

This paper contains 54 sections, 50 equations, 5 figures, 14 tables, 3 algorithms.

Figures (5)

  • Figure 1: The control setting: Agent interacting with an environment and receiving a reward signal.
  • Figure 2: Reward per time-step in a 12 armed Bandit over 50 runs - without random baseline
  • Figure 3: Reward per time-step in a 12 armed Bandit over 50 runs - without optimistic baseline
  • Figure 4: Reward per time-step in a 12 armed Bandit over 50 runs - all agents
  • Figure 5: Comparison of three agents across all 25 tasks.