Table of Contents
Fetching ...

Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving

Xiaoyu Qiu, Taewon Yang, Zhanhao Liu, Guanyang Wang, Liyue Shen

TL;DR

The work addresses the gap in evaluating uncertainty for Plug-and-Play Diffusion Priors in inverse problems, where the target is a posterior distribution rather than a single reconstruction. It introduces a unified uncertainty-aware benchmark combining a controlled toy model with a taxonomy of solvers and extensive real-data experiments (including OOD tasks) to study epistemic and aleatoric uncertainty. Key findings show that posterior-targeting solvers tend to calibrate uncertainty more faithfully than heuristic or MAP-like methods, while accuracy and uncertainty can be decoupled and uncertainty generally grows with data sparsity; OOD scenarios reveal distinct, task-specific uncertainty patterns. The paper provides practical guidance for evaluating and designing UQ-conscious diffusion samplers, highlighting the need for uncertainty-aware benchmarks in scientific inverse problems.

Abstract

Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.

Benchmarking Uncertainty Quantification of Plug-and-Play Diffusion Priors for Inverse Problems Solving

TL;DR

The work addresses the gap in evaluating uncertainty for Plug-and-Play Diffusion Priors in inverse problems, where the target is a posterior distribution rather than a single reconstruction. It introduces a unified uncertainty-aware benchmark combining a controlled toy model with a taxonomy of solvers and extensive real-data experiments (including OOD tasks) to study epistemic and aleatoric uncertainty. Key findings show that posterior-targeting solvers tend to calibrate uncertainty more faithfully than heuristic or MAP-like methods, while accuracy and uncertainty can be decoupled and uncertainty generally grows with data sparsity; OOD scenarios reveal distinct, task-specific uncertainty patterns. The paper provides practical guidance for evaluating and designing UQ-conscious diffusion samplers, highlighting the need for uncertainty-aware benchmarks in scientific inverse problems.

Abstract

Plug-and-play diffusion priors (PnPDP) have become a powerful paradigm for solving inverse problems in scientific and engineering domains. Yet, current evaluations of reconstruction quality emphasize point-estimate accuracy metrics on a single sample, which do not reflect the stochastic nature of PnPDP solvers and the intrinsic uncertainty of inverse problems, critical for scientific tasks. This creates a fundamental mismatch: in inverse problems, the desired output is typically a posterior distribution and most PnPDP solvers induce a distribution over reconstructions, but existing benchmarks only evaluate a single reconstruction, ignoring distributional characterization such as uncertainty. To address this gap, we conduct a systematic study to benchmark the uncertainty quantification (UQ) of existing diffusion inverse solvers. Specifically, we design a rigorous toy model simulation to evaluate the uncertainty behavior of various PnPDP solvers, and propose a UQ-driven categorization. Through extensive experiments on toy simulations and diverse real-world scientific inverse problems, we observe uncertainty behaviors consistent with our taxonomy and theoretical justification, providing new insights for evaluating and understanding the uncertainty for PnPDPs.
Paper Structure (55 sections, 3 theorems, 14 equations, 17 figures, 7 tables, 2 algorithms)

This paper contains 55 sections, 3 theorems, 14 equations, 17 figures, 7 tables, 2 algorithms.

Key Result

Theorem B.2

Consider running $K$ iterations of PnP-DM with a constant coupling $\rho_k\equiv\rho>0$ and a score estimate $s_t$. Let $t^\ast>0$ satisfy $\sigma(t^\ast)=\rho$, and define as in wu2024PnPDM. Let $\nu_\tau$ and $\pi_\tau$ denote the distributions at time $\tau$ of the non-stationary and stationary processes, respectively. Over $\tau\in[0,T]$ with $T:=K(t^\ast+1)$, we have: where $\pi_X$ and $\nu

Figures (17)

  • Figure 1: Illustration of the Accuracy Trap phenomenon and three types of uncertainty behaviors. Blue contours show the ground-truth posterior $p(x \mid y)$, which can be multi-modes. The red star denotes the ground-truth $x^*$;$\hat{x}_1,\hat{x}_3$ are posterior-plausible reconstructions, while $\hat{x}_2$ is an off-posterior reconstruction that can be closer to $x^*$ than $\hat{x}_3$. The posterior targeting sampler (red) aims to match the posterior distribution and can represent uncertainty across modes. The heuristic sampler (orange) may generate samples both on and off the posterior support. The MAP-like sampler (purple) concentrates around a single mode.
  • Figure 2: Similar reconstruction with distinct uncertainty. Comparison of PnPDP solvers on linear inverse scattering reconstruction with $K = 100$ times reconstruction on each solver. Top two Row: These methods produce similar reconstruction quality (in PSNR). Bottom Row: The pixel-wise variance maps reveal a fundamental difference. REDDiffmardani2023reddiff1 exhibits the lowest variance, while DPSchung2022diffusionDPS shows the highest uncertainty in structure-rich areas.
  • Figure 3: Average 95% coverage across methods (bars), with mean interval width marked as dots. The dashed line indicates the nominal 0.95 target; bars around 0.95 indicate better calibration, while higher dots indicate wider intervals.
  • Figure 4: Observed- and null-space posterior variances in paired bars, with null/observed ratio shown as dots. Dashed lines mark the theoretical variances and solid line marks the theoretical null/observed ratio; points or bars beyond the axis limit indicate overflow.
  • Figure 5: Accuracy–UQ scatterplots. Left: RMSE versus average coverage in Exp.1 (A=I). Right: RMSE versus null/observed variance ratio in Exp.2 (0-1 singular value). Points are color‑grouped by method family; lower RMSE indicates better reconstruction accuracy, while coverage/ratio summarizes uncertainty calibration.
  • ...and 12 more figures

Theorems & Definitions (3)

  • Theorem B.2: PnP-DM convergence bound (Theorem 3.1 in wu2024PnPDM)
  • Theorem B.4: MCG-Diff target and particle approximation (Prop. 2.3 in cardoso2023monteMCGdiff)
  • Theorem B.6: Asymptotic consistency of FPS-SMC dou2024fpsdiffusion