Table of Contents
Fetching ...

BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng

TL;DR

BayesDiff addresses the lack of a sample-wise quality metric for diffusion-generated images by estimating pixel-wise Bayesian uncertainty during image generation. It leverages a last-layer Laplace approximation to quantify predictive uncertainty of the noise predictor and derives an uncertainty iteration principle to propagate uncertainty through the reverse diffusion process. The approach enables image-level filtering, diverse augmentation, and artifact rectification for text-to-image tasks, with an efficient variant (BayesDiff-Skip) to reduce computational cost. Across multiple backbones and samplers, higher pixel-wise uncertainty correlates with clutter and misalignment, while uncertainty-guided resampling can rectify artifacts, demonstrating practical utility in real-world diffusion workflows.

Abstract

Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications.

BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

TL;DR

BayesDiff addresses the lack of a sample-wise quality metric for diffusion-generated images by estimating pixel-wise Bayesian uncertainty during image generation. It leverages a last-layer Laplace approximation to quantify predictive uncertainty of the noise predictor and derives an uncertainty iteration principle to propagate uncertainty through the reverse diffusion process. The approach enables image-level filtering, diverse augmentation, and artifact rectification for text-to-image tasks, with an efficient variant (BayesDiff-Skip) to reduce computational cost. Across multiple backbones and samplers, higher pixel-wise uncertainty correlates with clutter and misalignment, while uncertainty-guided resampling can rectify artifacts, demonstrating practical utility in real-world diffusion workflows.

Abstract

Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications.
Paper Structure (20 sections, 40 equations, 12 figures, 1 table, 2 algorithms)

This paper contains 20 sections, 40 equations, 12 figures, 1 table, 2 algorithms.

Figures (12)

  • Figure 1: Given an initial point ${\bm{x}}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$, our BayesDiff framework incorporates uncertainty into the denoising process and generates images with pixel-wise uncertainty estimates.
  • Figure 2: A study on the reliability of the BayesDiff-Skip algorithm. The top images with the highest uncertainty selected by BayesDiff are still with high uncertainty in BayesDiff-Skip algorithm.
  • Figure 3: The images with the highest (left) and lowest (right) uncertainty among 5000 unconditional generations of U-ViT model trained on ImageNet at $256\times 256$ resolution.
  • Figure 4: The images with the highest (left) and lowest (right) uncertainty among 80 generations on Stable Diffusion at $512\times 512$ resolution.
  • Figure 5: FID, Precision and Recall scores of 5 groups of generations with descending uncertainty on CELEBA and ImageNet datasets. Results show there is a strong correlation between our sample-wise uncertainty metric and traditional distributional metrics.
  • ...and 7 more figures