Table of Contents
Fetching ...

Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models

Lucas Berry, Axel Brando, David Meger

TL;DR

DECU tackles the challenge of estimating epistemic uncertainty in large conditioned diffusion models by combining an efficient, partially frozen ensemble in latent diffusion with Pairwise-Distance Estimators (PaiDEs). By training only a compact class-embedding per component and leveraging pre-trained backbones, it reduces training parameters from hundreds of millions to about 512k while enabling reliable mutual-information-based uncertainty estimates via PaiDEs in latent space. Evaluations on ImageNet show that DECU captures higher epistemic uncertainty for under-sampled classes and provides per-pixel uncertainty insights, offering a scalable tool for interpretability and safety in large generative models. The approach advances uncertainty quantification in high-dimensional generative tasks and suggests practical paths for monitoring and risk assessment in real-world deployments.

Abstract

Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.

Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models

TL;DR

DECU tackles the challenge of estimating epistemic uncertainty in large conditioned diffusion models by combining an efficient, partially frozen ensemble in latent diffusion with Pairwise-Distance Estimators (PaiDEs). By training only a compact class-embedding per component and leveraging pre-trained backbones, it reduces training parameters from hundreds of millions to about 512k while enabling reliable mutual-information-based uncertainty estimates via PaiDEs in latent space. Evaluations on ImageNet show that DECU captures higher epistemic uncertainty for under-sampled classes and provides per-pixel uncertainty insights, offering a scalable tool for interpretability and safety in large generative models. The approach advances uncertainty quantification in high-dimensional generative tasks and suggests practical paths for monitoring and risk assessment in real-world deployments.

Abstract

Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.

Paper Structure

This paper contains 17 sections, 12 equations, 13 figures, 1 table.

Figures (13)

  • Figure 1: Image generation progression through DECU, each row refers to an ensemble component, for the class label of Bernese mountain dog with low epistemic uncertainty (a) and moving van with high epistemic uncertainty (b).
  • Figure 2: The ensemble pipeline for DECU, shown here with two components. During the reverse process, the previous latent vector $z^j_{t}$ passes through a UNet to yield $z^j_{t-1}$. Dashed lines signify the random selection of one ensemble component for rollout until the branching point. Our ensembles are constructed within the embedding layer, which accepts the class label as input. We create diversity through random initialization and by training each component on different subsets of the data. The encoders, decoders, and UNets for each component are shared, and we leverage pretrained networks from rombach2022high. Notably, this reduces the number of parameters required for training from 456 million to 512 thousand.
  • Figure 3: Our estimator for epistemic uncertainty increases with distance from the branch point, converging to $-\ln\frac{1}{5}\approx1.609$.
  • Figure 4: The left image displays low epistemic uncertainty image generation (bin 1300) for five class labels: bullfrog, carbonara, grey fox, container ship, and yellow lady's slipper. The right image shows high epistemic uncertainty image generation (bin 1) for cleaver, Sealyham terrier, lotion, shoji, and whiskey jug. Each row represents an ensemble component with $b=1000$.
  • Figure 5: This figure displays uncertainty distributions for each bin, derived from corresponding class uncertainty estimates.
  • ...and 8 more figures

Theorems & Definitions (1)

  • proof