Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Litu Rout; Negin Raoof; Giannis Daras; Constantine Caramanis; Alexandros G. Dimakis; Sanjay Shakkottai

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai

TL;DR

This work addresses solving linear inverse problems by leveraging pre-trained latent diffusion models, extending prior pixel-space approaches to the latent space without task-specific finetuning. It introduces PSLD, a latent-diffusion posterior-sampling framework that incorporates a measurement-consistency step and a gluing objective to keep latent variables on the data manifold, with theoretical guarantees in a two-step linear diffusion setting. Theoretical results establish exact or robust recovery under mild subspace and measurement assumptions, while experiments show PSLD achieving state-of-the-art performance across inpainting, denoising, deblurring, destriping, and super-resolution on both in-distribution and out-of-distribution data. Practically, this enables using powerful foundation models like Stable Diffusion for diverse inverse problems without retraining, enhancing robustness and generalization without extra training costs.

Abstract

We present the first framework to solve linear inverse problems leveraging pre-trained latent diffusion models. Previously proposed algorithms (such as DPS and DDRM) only apply to pixel-space diffusion models. We theoretically analyze our algorithm showing provable sample recovery in a linear model setting. The algorithmic insight obtained from our analysis extends to more general settings often considered in practice. Experimentally, we outperform previously proposed posterior sampling algorithms in a wide variety of problems including random inpainting, block inpainting, denoising, deblurring, destriping, and super-resolution.

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

TL;DR

Abstract

Paper Structure (18 sections, 11 theorems, 39 equations, 17 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 11 theorems, 39 equations, 17 figures, 5 tables, 2 algorithms.

Introduction
Background and Method
Method
Theoretical Results
Problem Setup
Posterior Sampling using Pixel-space Diffusion Model
Posterior Sampling using Latent Diffusion Model
Experimental Evaluation
Conclusion
Technical Proofs
Proof of Theorem \ref{['thm:ps-adm']}
Proof of Proposition \ref{['prop:vae']}
Proof of Theorem \ref{['thm:gen-ldm']}
Proof of Theorem \ref{['thm:ps-ldm']}
Proof of Theorem \ref{['thm:ps-ldm-robust']}
...and 3 more sections

Key Result

Theorem 3.3

Suppose Assumption assm:ortho holds. Let For a fixed variance $\beta > 0$, if $\mu_{\bm{\theta}}\left( \overrightarrow{{\bm{x}}_1}\left(\overrightarrow{{\bm{x}}_0}, \overrightarrow{\bm\epsilon}\right) \right )\coloneqq {\bm{\theta}} \overrightarrow{{\bm{x}}_1}\left(\overrightarrow{{\bm{x}}_0}, \overrightarrow{\bm\epsilon}\right)$, then the

Figures (17)

Figure 1: Overall pipeline of our proposed framework from left to right. Given an image (left) and a user defined mask (center), our algorithm inpaints the masked region (right). The known part of the images are unaltered (see Appendix \ref{['sec-addn-exps']} for web demo and image sources).
Figure 2: Inpainting results in general domain images from the web (see Appendix \ref{['sec-addn-exps']} for image sources). Our model compared to state-of-art commercial inpainting services that leverage the same foundation model (Stable Diffusion v-1.5).
Figure 3: Left panel: Random Inpainting on images from FFHQ 256 ffhq using PSLD with Stable Diffusion v-1.5. Notice the text in the top row and the facial expression in the bottom row. Right panel: Block ($128\times 128$) inpainting, using the LDM-VQ-4 model trained on FFHQ $256$ffhq. Notice the glasses in the top row and eyes in the bottom row.
Figure 4: Inpainting (random and box) results on out-of-distribution samples, $256 \times 256$ (see Appendix \ref{['sec-addn-exps']} for image sources). We use PSLD with Stable Diffusion v-1.5 as generative foundation model.
Figure 5: Comparing DPS and PSLD performance in random inpainting on FFHQ 256 ffhqdps, as the percentage of masked pixels increases. PSLD with Stable Diffusion outperforms DPS.
...and 12 more figures

Theorems & Definitions (12)

Theorem 3.3: Generative Modeling using Diffusion in Pixel Space, rout2023theoretical
Theorem 3.4: Posterior Sampling using Diffusion in Pixel Space
Proposition 3.5: Variational Autoencoder
Theorem 3.6: Generative Modeling using Diffusion in Latent Space
Theorem 3.7: Posterior Sampling using Goodness Modified Latent DPS
Theorem 3.8: Posterior Sampling using Diffusion in Latent Space
Theorem A.1: Posterior Sampling using Diffusion in Pixel Space
Proposition A.2: Variational Autoencoder
proof
Theorem A.3: Generative Modeling using Diffusion in Latent Space
...and 2 more

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

TL;DR

Abstract

Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (12)