Disentangling Uncertainties by Learning Compressed Data Representation

Zhiyu An; Zhibo Hou; Wan Du

Disentangling Uncertainties by Learning Compressed Data Representation

Zhiyu An, Zhibo Hou, Wan Du

TL;DR

This work tackles separating epistemic and aleatoric uncertainty in learned system dynamics by introducing the Compressed Data Representation Model (CDRM), which learns a neural encoding of the data distribution and enables sampling from arbitrary next-state distributions via Langevin dynamics. By framing CDRM as an Energy-Based Model with a binary-label training objective and a Langevin-based inference procedure, the authors derive uncertainty estimates that combine KDE-based epistemic signals with model-driven confidence to yield distinct AU and EU. Theoretical analysis shows memory and computational advantages over bin-based compression in high-dimensional spaces, while experiments on toy datasets and a room-exploration task demonstrate superior disentanglement of AU/EU and effective handling of multimodal next-state distributions. Overall, CDRM offers a principled, scalable mechanism for uncertainty-aware prediction in control and reinforcement learning, with practical benefits for safe and efficient exploration and policy transfer.

Abstract

We study aleatoric and epistemic uncertainty estimation in a learned regressive system dynamics model. Disentangling aleatoric uncertainty (the inherent randomness of the system) from epistemic uncertainty (the lack of data) is crucial for downstream tasks such as risk-aware control and reinforcement learning, efficient exploration, and robust policy transfer. While existing approaches like Gaussian Processes, Bayesian networks, and model ensembles are widely adopted, they suffer from either high computational complexity or inaccurate uncertainty estimation. To address these limitations, we propose the Compressed Data Representation Model (CDRM), a framework that learns a neural network encoding of the data distribution and enables direct sampling from the output distribution. Our approach incorporates a novel inference procedure based on Langevin dynamics sampling, allowing CDRM to predict arbitrary output distributions rather than being constrained to a Gaussian prior. Theoretical analysis provides the conditions where CDRM achieves better memory and computational complexity compared to bin-based compression methods. Empirical evaluations show that CDRM demonstrates a superior capability to identify aleatoric and epistemic uncertainties separately, achieving AUROCs of 0.8876 and 0.9981 on a single test set containing a mixture of both uncertainties. Qualitative results further show that CDRM's capability extends to datasets with multimodal output distributions, a challenging scenario where existing methods consistently fail. Code and supplementary materials are available at https://github.com/ryeii/CDRM.

Disentangling Uncertainties by Learning Compressed Data Representation

TL;DR

Abstract

Disentangling Uncertainties by Learning Compressed Data Representation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)

Theorems & Definitions (2)