Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models
Lucas Berry, Axel Brando, David Meger
TL;DR
DECU tackles the challenge of estimating epistemic uncertainty in large conditioned diffusion models by combining an efficient, partially frozen ensemble in latent diffusion with Pairwise-Distance Estimators (PaiDEs). By training only a compact class-embedding per component and leveraging pre-trained backbones, it reduces training parameters from hundreds of millions to about 512k while enabling reliable mutual-information-based uncertainty estimates via PaiDEs in latent space. Evaluations on ImageNet show that DECU captures higher epistemic uncertainty for under-sampled classes and provides per-pixel uncertainty insights, offering a scalable tool for interpretability and safety in large generative models. The approach advances uncertainty quantification in high-dimensional generative tasks and suggests practical paths for monitoring and risk assessment in real-world deployments.
Abstract
Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.
