Table of Contents
Fetching ...

Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation

Michele De Vita, Vasileios Belagiannis

TL;DR

We address the lack of quantitative uncertainty in diffusion-model image generation by introducing a training-free, pixel-wise aleatoric uncertainty estimate computed as the variance of denoising scores under a diffusion-specific perturbation during sampling. This uncertainty is shown to be linked to the second derivative (curvature) of the noising distribution, enabling a principled guidance of the sampling process to emphasize high-uncertainty regions and improve sample quality. Through extensive experiments on ImageNet and CIFAR-10, the method both filters out low-quality samples and achieves better FID scores compared to baselines like MC-Dropout and BayesDiff, with substantially fewer function evaluations. The approach is scheduler-agnostic and compatible with unconditional, class-conditional, and text-to-image diffusion models, offering a practical, training-free tool for improving diffusion-based generation and reliability.

Abstract

Despite the remarkable progress in generative modelling, current diffusion models lack a quantitative approach to assess image quality. To address this limitation, we propose to estimate the pixel-wise aleatoric uncertainty during the sampling phase of diffusion models and utilise the uncertainty to improve the sample generation quality. The uncertainty is computed as the variance of the denoising scores with a perturbation scheme that is specifically designed for diffusion models. We then show that the aleatoric uncertainty estimates are related to the second-order derivative of the diffusion noise distribution. We evaluate our uncertainty estimation algorithm and the uncertainty-guided sampling on the ImageNet and CIFAR-10 datasets. In our comparisons with the related work, we demonstrate promising results in filtering out low quality samples. Furthermore, we show that our guided approach leads to better sample generation in terms of FID scores.

Diffusion Model Guided Sampling with Pixel-Wise Aleatoric Uncertainty Estimation

TL;DR

We address the lack of quantitative uncertainty in diffusion-model image generation by introducing a training-free, pixel-wise aleatoric uncertainty estimate computed as the variance of denoising scores under a diffusion-specific perturbation during sampling. This uncertainty is shown to be linked to the second derivative (curvature) of the noising distribution, enabling a principled guidance of the sampling process to emphasize high-uncertainty regions and improve sample quality. Through extensive experiments on ImageNet and CIFAR-10, the method both filters out low-quality samples and achieves better FID scores compared to baselines like MC-Dropout and BayesDiff, with substantially fewer function evaluations. The approach is scheduler-agnostic and compatible with unconditional, class-conditional, and text-to-image diffusion models, offering a practical, training-free tool for improving diffusion-based generation and reliability.

Abstract

Despite the remarkable progress in generative modelling, current diffusion models lack a quantitative approach to assess image quality. To address this limitation, we propose to estimate the pixel-wise aleatoric uncertainty during the sampling phase of diffusion models and utilise the uncertainty to improve the sample generation quality. The uncertainty is computed as the variance of the denoising scores with a perturbation scheme that is specifically designed for diffusion models. We then show that the aleatoric uncertainty estimates are related to the second-order derivative of the diffusion noise distribution. We evaluate our uncertainty estimation algorithm and the uncertainty-guided sampling on the ImageNet and CIFAR-10 datasets. In our comparisons with the related work, we demonstrate promising results in filtering out low quality samples. Furthermore, we show that our guided approach leads to better sample generation in terms of FID scores.

Paper Structure

This paper contains 38 sections, 15 equations, 13 figures, 6 tables, 2 algorithms.

Figures (13)

  • Figure 1: Visual Results I. We provide qualitative samples of uncertainty guidance applied to Stable Diffusion 3 esser2024scaling (first two columns) and 1.5 rombach2022high (last two columns). In the upper row we present images produced without the uncertainty guidance while the bottom row features images generated with the uncertainty guidance. We can observe that the images with uncertainty guidance present fewer artefacts and more faithful generation
  • Figure 2: Illustration of our uncertainty estimation algorithm for the timestep t. We compute the uncertainty of the denoising process at step $t$ by first computing an approximation of the denoised image $\hat{\mathbf{X}}_0$ and then sampling from the distribution $q(\hat{\mathbf{X}}_t|\hat{\mathbf{X}}_0)$ multiple times. The variance of the scores $\varepsilon_\theta(\hat{\mathbf{X}}_t, t)$ is then computed as the uncertainty of the image at step $t$.
  • Figure 3: We present posterior uncertainty in pixel (left) and latent (right) spaces. The blue line shows average uncertainty over 60,000 samples, with standard deviation in the surrounding blue area. This pattern was consistent across all evaluated models.
  • Figure 4: Left: generated image from DDPM trained on Imagenet64 with 50 steps and DDIM sampler. Right: uncertainty map of the generated image. The uncertainty map is obtained by summing the step-wise uncertainty of the sampling process. We observe that most of the uncertainty is concentrated in the foreground elements of the image.
  • Figure 5: Left: generated image from DDPM trained on Imagenet64 with 50 steps and DDIM sampler. Right: uncertainty map from MC-Dropout of the generated image. The uncertainty map is obtained by summing the step-wise uncertainty of the sampling process. We observe that most of the uncertainty is concentrated in the edges of the foreground elements of the image.
  • ...and 8 more figures