Table of Contents
Fetching ...

Uncertainty-driven Adaptive Exploration

Leonidas Bakopoulos, Georgios Chalkiadakis

TL;DR

Uncertainty-driven Adaptive Exploration (ADEU) addresses intra-episode switching between exploration and exploitation in deep reinforcement learning by tying action uncertainty to the sampling distribution. The method samples actions from a distribution $D$ with mean $\pi(s)$ and variance determined by $g(f(s))$, where $f(s)$ is a chosen uncertainty mechanism, enabling principled, context-aware exploration. ADEU is shown to generalize existing adaptive exploration approaches and to be compatible with various uncertainty measures, including intrinsic motivation and epistemic uncertainty frameworks. Empirical results in MuJoCo robotic tasks demonstrate that ADEU variants outperform standard exploration schemes and often surpass the original methods they extend, underscoring the plug-and-play nature of the framework. The work points to future directions in safe exploration and richer moment design to guide exploration more precisely while maintaining performance gains.

Abstract

Adaptive exploration methods propose ways to learn complex policies via alternating between exploration and exploitation. An important question for such methods is to determine the appropriate moment to switch between exploration and exploitation and vice versa. This is critical in domains that require the learning of long and complex sequences of actions. In this work, we present a generic adaptive exploration framework that employs uncertainty to address this important issue in a principled manner. Our framework includes previous adaptive exploration approaches as special cases. Moreover, we can incorporate in our framework any uncertainty-measuring mechanism of choice, for instance mechanisms used in intrinsic motivation or epistemic uncertainty-based exploration methods. We experimentally demonstrate that our framework gives rise to adaptive exploration strategies that outperform standard ones across several MuJoCo environments.

Uncertainty-driven Adaptive Exploration

TL;DR

Uncertainty-driven Adaptive Exploration (ADEU) addresses intra-episode switching between exploration and exploitation in deep reinforcement learning by tying action uncertainty to the sampling distribution. The method samples actions from a distribution with mean and variance determined by , where is a chosen uncertainty mechanism, enabling principled, context-aware exploration. ADEU is shown to generalize existing adaptive exploration approaches and to be compatible with various uncertainty measures, including intrinsic motivation and epistemic uncertainty frameworks. Empirical results in MuJoCo robotic tasks demonstrate that ADEU variants outperform standard exploration schemes and often surpass the original methods they extend, underscoring the plug-and-play nature of the framework. The work points to future directions in safe exploration and richer moment design to guide exploration more precisely while maintaining performance gains.

Abstract

Adaptive exploration methods propose ways to learn complex policies via alternating between exploration and exploitation. An important question for such methods is to determine the appropriate moment to switch between exploration and exploitation and vice versa. This is critical in domains that require the learning of long and complex sequences of actions. In this work, we present a generic adaptive exploration framework that employs uncertainty to address this important issue in a principled manner. Our framework includes previous adaptive exploration approaches as special cases. Moreover, we can incorporate in our framework any uncertainty-measuring mechanism of choice, for instance mechanisms used in intrinsic motivation or epistemic uncertainty-based exploration methods. We experimentally demonstrate that our framework gives rise to adaptive exploration strategies that outperform standard ones across several MuJoCo environments.

Paper Structure

This paper contains 14 sections, 5 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The Frozen Lake environment, in which there is a single path connecting the initial state with the target. Exploration becomes more challenging when increasing the grid dimensions.
  • Figure 2: (Top) Blue line shows $\pi(s)$ across one episode. The red line shows the action selected by the agent. (Bottom) Uncertainty as calculated by the agent across the states of a single episode.