Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning
Seyeon Kim, Joonhun Lee, Namhoon Cho, Sungjun Han, Wooseop Hwang
TL;DR
Conventional TD learning often relies on a zero-mean Gaussian TD error, which misrepresents tail uncertainty. The paper introduces a generalized Gaussian error framework with a shape parameter $\beta$ to capture tail behavior and derives a closed-form expression for aleatoric uncertainty and a risk-averse weighting to mitigate epistemic uncertainty, complemented by a batch inverse error variance regularization. The method is instantiated with a beta-head that estimates $\beta$ (and fixes $\alpha=1$) and evaluated with SAC and PPO on MuJoCo and discrete tasks, showing improved sample efficiency and robustness. This work advances uncertainty-aware RL by accommodating non-Gaussian TD errors and providing practical mechanisms to balance aleatoric and epistemic uncertainty in learning.
Abstract
Conventional uncertainty-aware temporal difference (TD) learning often assumes a zero-mean Gaussian distribution for TD errors, leading to inaccurate error representations and compromised uncertainty estimation. We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning to enhance the flexibility of error distribution modeling by incorporating additional higher-order moment, particularly kurtosis, thereby improving the estimation and mitigation of data-dependent aleatoric uncertainty. We examine the influence of the shape parameter of the generalized Gaussian distribution (GGD) on aleatoric uncertainty and provide a closed-form expression that demonstrates an inverse relationship between uncertainty and the shape parameter. Additionally, we propose a theoretically grounded weighting scheme to address epistemic uncertainty by fully leveraging the GGD. We refine batch inverse variance weighting with bias reduction and kurtosis considerations, enhancing robustness. Experiments with policy gradient algorithms demonstrate significant performance gains.
