Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

Alex Christopher Stutts; Danilo Erricolo; Theja Tulabandhula; Amit Ranjan Trivedi

Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

Alex Christopher Stutts, Danilo Erricolo, Theja Tulabandhula, Amit Ranjan Trivedi

TL;DR

This work tackles the challenge of disentangling aleatoric and epistemic uncertainty in distributional deep reinforcement learning. It introduces CEQR-DQN, which fuses calibrated quantile regression (QR-DQN) with deep evidential learning (Normal-Inverse-Gamma priors) and conformal inference to produce global uncertainty estimates that guide exploration. The method employs quantile calibration and evidential calibration losses, along with Thompson sampling, to achieve robust uncertainty-aware action selection. Empirical results on MinAtar demonstrate faster learning and higher scores than strong baselines, illustrating the practical impact of reliable uncertainty quantification for exploration in stochastic, out-of-distribution settings.

Abstract

We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.

Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

TL;DR

Abstract

, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of

uncertainty as opposed to

estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.

Paper Structure (10 sections, 16 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 10 sections, 16 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Related Work
Uncertainty-Aware Reinforcement Learning
Deep Evidential Learning for Action Choice
Quantile Calibration via Conformal Methods
Calibrated Evidential Quantile Regression in DQN (CEQR-DQN)
Results
Conclusion
Model Architecture
Simulation Settings

Figures (4)

Figure 1: Deep Uncertainty-Aware Reinforcement Learning: In this study, we present a novel solution for separately quantifying aleatoric and epistemic uncertainty in distributional reinforcement learning involving deep Q networks. The proposed framework combines deep evidential learning with calibrated quantile regression based on conformal inference to significantly enhance agent exploration through uncertainty-aware action selection. The graphic above depicts a cautious, uncertainty-aware robot agent trying to escape a dangerous maze laden with traps and ambiguous paths; it was generated with the assistance of AI and post-edited.
Figure 2: Calibrated Evidential Quantile Regression Synthetic Example: Demonstration of aleatoric and epistemic uncertainty estimates by the proposed algorithm on a fresh test set including OOD samples from function $f(x) = \sin(3x) \cdot \cos(2x) + 0.5 \cdot e^{-x^2} + x^2 - 0.1x$ with added random noise $n\sim\mathcal{N}(0,1.5e^{-0.4|x|})$.
Figure 3: MinAtar Results: Comparison of second-order running average training scores between CEQR-DQN and UA-DQN across 25 random seeds on the simplified MinAtar games with $2.5$ million frames. Max episode scores for CEQR-DQN in each game are $158$, $4481$, $65$, $199$, and $4677$, while for UA-DQN they are $95$, $2457$, $69$, $95$, and $2231$. Results are also shown for CEQR-DQN without calibration.
Figure 4: Model Architecture for CEQR-DQN: A single-layer CNN extracts features from an environment state observation and feeds two separate linear network heads to output $N$ quantiles for $A$ actions and the $5^{th}$ and $95^{th}$ percentiles for each evidential parameter associated with each action.

Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

TL;DR

Abstract

Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)