Table of Contents
Fetching ...

Learning few-step posterior samplers by unfolding and distillation of diffusion models

Charlesquin Kemajou Mbakam, Jonathan Spence, Marcelo Pereyra

TL;DR

The paper tackles Bayesian image reconstruction under ill-posed forward models by leveraging diffusion-model priors. It introduces UD$^2$Ms, a framework that unfolds the LATINO Langevin sampler into a trainable, few-step conditional diffusion model via deep unfolding and LoRA-based distillation, allowing joint handling of multiple likelihoods at inference. By sampling from $p(\mathbf{x}_0|\mathbf{y},\mathbf{x}_t)$ through a learned proximal operator and a pre-trained DM prior, UD$^2$Ms achieve high accuracy with around $\mathcal{O}(10)$ neural function evaluations, while preserving flexibility to adapt to different forward models at test time. Extensive experiments on Gaussian/uniform/motion deblurring, inpainting, SR, and JPEG artifact removal on ImageNet and LSUN demonstrate strong PSNR/LPIPS/FID gains and robust generalization, with ablations showing the benefits of unfolding depth, initialization, and LoRA rank. Overall, the approach merges the advantages of distillation and PnP strategies to deliver efficient, accurate posterior sampling for diverse inverse problems in computational imaging.

Abstract

Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm - specifically, the recently proposed LATINO Langevin sampler (Spagnoletti et al., 2025) - representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

Learning few-step posterior samplers by unfolding and distillation of diffusion models

TL;DR

The paper tackles Bayesian image reconstruction under ill-posed forward models by leveraging diffusion-model priors. It introduces UDMs, a framework that unfolds the LATINO Langevin sampler into a trainable, few-step conditional diffusion model via deep unfolding and LoRA-based distillation, allowing joint handling of multiple likelihoods at inference. By sampling from through a learned proximal operator and a pre-trained DM prior, UDMs achieve high accuracy with around neural function evaluations, while preserving flexibility to adapt to different forward models at test time. Extensive experiments on Gaussian/uniform/motion deblurring, inpainting, SR, and JPEG artifact removal on ImageNet and LSUN demonstrate strong PSNR/LPIPS/FID gains and robust generalization, with ablations showing the benefits of unfolding depth, initialization, and LoRA rank. Overall, the approach merges the advantages of distillation and PnP strategies to deliver efficient, accurate posterior sampling for diverse inverse problems in computational imaging.

Abstract

Diffusion models (DMs) have emerged as powerful image priors in Bayesian computational imaging. Two primary strategies have been proposed for leveraging DMs in this context: Plug-and-Play methods, which are zero-shot and highly flexible but rely on approximations; and specialized conditional DMs, which achieve higher accuracy and faster inference for specific tasks through supervised training. In this work, we introduce a novel framework that integrates deep unfolding and model distillation to transform a DM image prior into a few-step conditional model for posterior sampling. A central innovation of our approach is the unfolding of a Markov chain Monte Carlo (MCMC) algorithm - specifically, the recently proposed LATINO Langevin sampler (Spagnoletti et al., 2025) - representing the first known instance of deep unfolding applied to a Monte Carlo sampling scheme. We demonstrate our proposed unfolded and distilled samplers through extensive experiments and comparisons with the state of the art, where they achieve excellent accuracy and computational efficiency, while retaining the flexibility to adapt to variations in the forward model at inference time.

Paper Structure

This paper contains 54 sections, 22 equations, 18 figures, 11 tables, 1 algorithm.

Figures (18)

  • Figure 1: Qualitative comparison of the proposed Unfolded and Distilled Diffusion Model (UD$^2$M) for posterior sampling on the ImageNet 256 dataset. Tasks: Gaussian Deblurring, random inpainting ($70\%$), super-resolution ($4\times$), and restoration of JPEG compression artifacts (QF=10).
  • Figure 2: Diagram of the proposed conditional sampling architecture, $L_\vartheta(y, x_t, t)$ derived by deep unfolding $K$ LATINO iterations spagnoletti2025LATINO. The prior is introduced via a pre-trained unconditional DM $G_\theta$ with LoRA adaptation $\Delta_{\theta}$, while the observation model, measurement $y$ and noisy state $x_t$ are involved via $g_{y,x_t} = -\log p(y,x_{t}|x_0)$. The first module is initialized by an estimator $f_\psi$ such as RAM terris2025reconstruct, or by setting $f_\psi(x_t,y) = A^\dagger y$. The unfolded network is finetuned and distilled to sample from $({\mathbf{x}}_0|y,x_t)$.
  • Figure 3: Comparison of posterior samples for the task SR ($\times$4) with noise level $\sigma=0.01$ on ImageNet 256.
  • Figure 4: Comparison of posterior samples for the task Gaussian deblurring with noise level $\sigma=0.01$ on ImageNet 256.
  • Figure 5: Comparison of posterior samples for the task JPEG artifact removal (QF=10) with noise level $\sigma=0.01$ on ImageNet 256.
  • ...and 13 more figures