Table of Contents
Fetching ...

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai

TL;DR

This work addresses solving linear inverse problems by leveraging pre-trained latent diffusion models, extending prior pixel-space approaches to the latent space without task-specific finetuning. It introduces PSLD, a latent-diffusion posterior-sampling framework that incorporates a measurement-consistency step and a gluing objective to keep latent variables on the data manifold, with theoretical guarantees in a two-step linear diffusion setting. Theoretical results establish exact or robust recovery under mild subspace and measurement assumptions, while experiments show PSLD achieving state-of-the-art performance across inpainting, denoising, deblurring, destriping, and super-resolution on both in-distribution and out-of-distribution data. Practically, this enables using powerful foundation models like Stable Diffusion for diverse inverse problems without retraining, enhancing robustness and generalization without extra training costs.

Abstract

We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

TL;DR

This work addresses solving linear inverse problems by leveraging pre-trained latent diffusion models, extending prior pixel-space approaches to the latent space without task-specific finetuning. It introduces PSLD, a latent-diffusion posterior-sampling framework that incorporates a measurement-consistency step and a gluing objective to keep latent variables on the data manifold, with theoretical guarantees in a two-step linear diffusion setting. Theoretical results establish exact or robust recovery under mild subspace and measurement assumptions, while experiments show PSLD achieving state-of-the-art performance across inpainting, denoising, deblurring, destriping, and super-resolution on both in-distribution and out-of-distribution data. Practically, this enables using powerful foundation models like Stable Diffusion for diverse inverse problems without retraining, enhancing robustness and generalization without extra training costs.

Abstract

We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.
Paper Structure (18 sections, 11 theorems, 39 equations, 17 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 11 theorems, 39 equations, 17 figures, 5 tables, 2 algorithms.

Key Result

Theorem 3.3

Suppose Assumption assm:ortho holds. Let For a fixed variance $\beta > 0$, if $\mu_{\bm{\theta}}\left( \overrightarrow{{\bm{x}}_1}\left(\overrightarrow{{\bm{x}}_0}, \overrightarrow{\bm\epsilon}\right) \right )\coloneqq {\bm{\theta}} \overrightarrow{{\bm{x}}_1}\left(\overrightarrow{{\bm{x}}_0}, \overrightarrow{\bm\epsilon}\right)$, then the

Figures (17)

  • Figure 1: Overall pipeline of our proposed framework from left to right. Given an image (left) and a user defined mask (center), our algorithm inpaints the masked region (right). The known part of the images are unaltered (see Appendix \ref{['sec-addn-exps']} for web demo and image sources).
  • Figure 2: Inpainting results in general domain images from the web (see Appendix \ref{['sec-addn-exps']} for image sources). Our model compared to state-of-art commercial inpainting services that leverage the same foundation model (Stable Diffusion v-1.5).
  • Figure 3: Left panel: Random Inpainting on images from FFHQ 256 ffhq using PSLD with Stable Diffusion v-1.5. Notice the text in the top row and the facial expression in the bottom row. Right panel: Block ($128\times 128$) inpainting, using the LDM-VQ-4 model trained on FFHQ $256$ffhq. Notice the glasses in the top row and eyes in the bottom row.
  • Figure 4: Inpainting (random and box) results on out-of-distribution samples, $256 \times 256$ (see Appendix \ref{['sec-addn-exps']} for image sources). We use PSLD with Stable Diffusion v-1.5 as generative foundation model.
  • Figure 5: Comparing DPS and PSLD performance in random inpainting on FFHQ 256 ffhqdps, as the percentage of masked pixels increases. PSLD with Stable Diffusion outperforms DPS.
  • ...and 12 more figures

Theorems & Definitions (12)

  • Theorem 3.3: Generative Modeling using Diffusion in Pixel Space, rout2023theoretical
  • Theorem 3.4: Posterior Sampling using Diffusion in Pixel Space
  • Proposition 3.5: Variational Autoencoder
  • Theorem 3.6: Generative Modeling using Diffusion in Latent Space
  • Theorem 3.7: Posterior Sampling using Goodness Modified Latent DPS
  • Theorem 3.8: Posterior Sampling using Diffusion in Latent Space
  • Theorem A.1: Posterior Sampling using Diffusion in Pixel Space
  • Proposition A.2: Variational Autoencoder
  • proof
  • Theorem A.3: Generative Modeling using Diffusion in Latent Space
  • ...and 2 more