Table of Contents
Fetching ...

A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty

Parvin Malekzadeh, Ming Hou, Konstantinos N. Plataniotis

TL;DR

The paper tackles exploration under both epistemic and aleatory uncertainty in reinforcement learning, showing that naive additive combinations can destabilize learning. It introduces UUaE, a belief-based distributional RL framework that maintains a belief over return-distribution parameters and uses moment-generating function features over a Dirac-delta belief to learn a composite uncertainty model via a Jensen-Tsallis loss. A risk-sensitive exploration rule that subtracts the estimated aleatory risk from the expected return guides action selection. The approach formalizes the connection between the two uncertainty sources, scales distributional RL to joint uncertainty estimation, and demonstrates improved stability and sample efficiency on challenging tasks like Atari games and autonomous driving simulations.

Abstract

Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epistemic uncertainty separately and consider the composite uncertainty as an additive combination of the two. Nevertheless, the additive formulation leads to excessive risk-taking behavior, causing instability. In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. Our method builds on a novel extension of distributional RL that estimates a parameterized return distribution whose parameters are random variables encoding epistemic uncertainty. Experimental results on tasks with exploration and risk challenges show that our method outperforms alternative approaches.

A unified uncertainty-aware exploration: Combining epistemic and aleatory uncertainty

TL;DR

The paper tackles exploration under both epistemic and aleatory uncertainty in reinforcement learning, showing that naive additive combinations can destabilize learning. It introduces UUaE, a belief-based distributional RL framework that maintains a belief over return-distribution parameters and uses moment-generating function features over a Dirac-delta belief to learn a composite uncertainty model via a Jensen-Tsallis loss. A risk-sensitive exploration rule that subtracts the estimated aleatory risk from the expected return guides action selection. The approach formalizes the connection between the two uncertainty sources, scales distributional RL to joint uncertainty estimation, and demonstrates improved stability and sample efficiency on challenging tasks like Atari games and autonomous driving simulations.

Abstract

Exploration is a significant challenge in practical reinforcement learning (RL), and uncertainty-aware exploration that incorporates the quantification of epistemic and aleatory uncertainty has been recognized as an effective exploration strategy. However, capturing the combined effect of aleatory and epistemic uncertainty for decision-making is difficult. Existing works estimate aleatory and epistemic uncertainty separately and consider the composite uncertainty as an additive combination of the two. Nevertheless, the additive formulation leads to excessive risk-taking behavior, causing instability. In this paper, we propose an algorithm that clarifies the theoretical connection between aleatory and epistemic uncertainty, unifies aleatory and epistemic uncertainty estimation, and quantifies the combined effect of both uncertainties for a risk-sensitive exploration. Our method builds on a novel extension of distributional RL that estimates a parameterized return distribution whose parameters are random variables encoding epistemic uncertainty. Experimental results on tasks with exploration and risk challenges show that our method outperforms alternative approaches.
Paper Structure (9 sections, 7 equations, 2 figures, 1 algorithm)

This paper contains 9 sections, 7 equations, 2 figures, 1 algorithm.

Figures (2)

  • Figure 1: Steps taken to derive our proposed $\text{UUaE}$ method. Orange and green shaded areas show learning of epistemic and aleatory uncertainty in $\text{UUaE}$.
  • Figure 2: Learning curves on two Atari games and autonomous vehicle driving task across $10$ runs.