Echoes of Socratic Doubt: Embracing Uncertainty in Calibrated Evidential Reinforcement Learning
Alex Christopher Stutts, Danilo Erricolo, Theja Tulabandhula, Amit Ranjan Trivedi
TL;DR
This work tackles the challenge of disentangling aleatoric and epistemic uncertainty in distributional deep reinforcement learning. It introduces CEQR-DQN, which fuses calibrated quantile regression (QR-DQN) with deep evidential learning (Normal-Inverse-Gamma priors) and conformal inference to produce global uncertainty estimates that guide exploration. The method employs quantile calibration and evidential calibration losses, along with Thompson sampling, to achieve robust uncertainty-aware action selection. Empirical results on MinAtar demonstrate faster learning and higher scores than strong baselines, illustrating the practical impact of reliable uncertainty quantification for exploration in stochastic, out-of-distribution settings.
Abstract
We present a novel statistical approach to incorporating uncertainty awareness in model-free distributional reinforcement learning involving quantile regression-based deep Q networks. The proposed algorithm, $\textit{Calibrated Evidential Quantile Regression in Deep Q Networks (CEQR-DQN)}$, aims to address key challenges associated with separately estimating aleatoric and epistemic uncertainty in stochastic environments. It combines deep evidential learning with quantile calibration based on principles of conformal inference to provide explicit, sample-free computations of $\textit{global}$ uncertainty as opposed to $\textit{local}$ estimates based on simple variance, overcoming limitations of traditional methods in computational and statistical efficiency and handling of out-of-distribution (OOD) observations. Tested on a suite of miniaturized Atari games (i.e., MinAtar), CEQR-DQN is shown to surpass similar existing frameworks in scores and learning speed. Its ability to rigorously evaluate uncertainty improves exploration strategies and can serve as a blueprint for other algorithms requiring uncertainty awareness.
