Table of Contents
Fetching ...

Blind Inversion using Latent Diffusion Priors

Weimin Bai, Siyi Chen, Wenzheng Chen, He Sun

TL;DR

This work tackles blind inverse problems where the forward operator is unknown by introducing LatentDEM, which embeds powerful latent diffusion priors into a variational EM framework to jointly estimate the hidden signal $\boldsymbol{x}$ and forward model parameters $\boldsymbol{\phi}$. The E-step performs posterior sampling in latent space with an annealing-consistent data term, while the M-step updates $\boldsymbol{\phi}$ via MAP estimation; a skip-gradient strategy further accelerates training. The method demonstrates strong 2D performance on blind motion deblurring and extends to non-linear 3D inverse rendering for pose-free sparse-view reconstruction, achieving improved view-consistency and novel-view quality. These results show that leveraging latent diffusion priors within EM enables robust blind inversion across 2D and 3D tasks, with practical impact on imaging and rendering where forward models are imperfect or unavailable.

Abstract

Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts.

Blind Inversion using Latent Diffusion Priors

TL;DR

This work tackles blind inverse problems where the forward operator is unknown by introducing LatentDEM, which embeds powerful latent diffusion priors into a variational EM framework to jointly estimate the hidden signal and forward model parameters . The E-step performs posterior sampling in latent space with an annealing-consistent data term, while the M-step updates via MAP estimation; a skip-gradient strategy further accelerates training. The method demonstrates strong 2D performance on blind motion deblurring and extends to non-linear 3D inverse rendering for pose-free sparse-view reconstruction, achieving improved view-consistency and novel-view quality. These results show that leveraging latent diffusion priors within EM enables robust blind inversion across 2D and 3D tasks, with practical impact on imaging and rendering where forward models are imperfect or unavailable.

Abstract

Diffusion models have emerged as powerful tools for solving inverse problems due to their exceptional ability to model complex prior distributions. However, existing methods predominantly assume known forward operators (i.e., non-blind), limiting their applicability in practical settings where acquiring such operators is costly. Additionally, many current approaches rely on pixel-space diffusion models, leaving the potential of more powerful latent diffusion models (LDMs) underexplored. In this paper, we introduce LatentDEM, an innovative technique that addresses more challenging blind inverse problems using latent diffusion priors. At the core of our method is solving blind inverse problems within an iterative Expectation-Maximization (EM) framework: (1) the E-step recovers clean images from corrupted observations using LDM priors and a known forward model, and (2) the M-step estimates the forward operator based on the recovered images. Additionally, we propose two novel optimization techniques tailored for LDM priors and EM frameworks, yielding more accurate and efficient blind inversion results. As a general framework, LatentDEM supports both linear and non-linear inverse problems. Beyond common 2D image restoration tasks, it enables new capabilities in non-linear 3D inverse rendering problems. We validate LatentDEM's performance on representative 2D blind deblurring and 3D sparse-view reconstruction tasks, demonstrating its superior efficacy over prior arts.
Paper Structure (40 sections, 21 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 40 sections, 21 equations, 11 figures, 3 tables, 2 algorithms.

Figures (11)

  • Figure 1: We apply our method on two representative blind inverse problems: 2D Blind Deblurring and Pose-free Spare-view 3D Reconsturction. Notably, in 2D task, our method achieves more accurate image recovery and kernel estimation over BlindDPS chung2022parallel, while in 3D task we successfully reconstruct consistent novel view images from unposed input views.
  • Figure 2: Overview of LatentDEM.Top: One EM iteration. Given currently estimated data and kernel, in the E-step, we draw new samples with LDM priors with the proposed annealing technique. In the M-step we apply maximum-a-posterior (MAP) to update forward parameters. Middle: Evolution of the optimized signals and forward parameters.
  • Figure 3: Blind motion deblurring results. Row (1-2): ImageNet. Row (3-4): FFHQ. Our method successfully recovers clean images and accurate blur kernels, consistently outperforming all the baselines, even under challenging cases where the observations are severely degraded.
  • Figure 4: Pose-free sparse-view 3D reconstruction results. Our method successfully synthesizes consistent novel view images given two sparse input views. In contrast, Zero123 liu2023zero produces images missing the engine handle that are not consistent with the input views, while LEAP jiang2022LEAP fails to generate photo-realistic images.
  • Figure 5: Effectiveness of our annealed consistency technique. Left: blur kernel accuracy curve (green) on 10 examples (std are represented by shadow). It indicates that the kernel is very wrong at the beginning but becomes meaningful when $t<600$, which corresponds to the annealing factor curve (red). Right: we further show that simply applying LDM priors in blind inversion produces images with severe artifacts due to the fragile latent space, while the annealing technique stabilizes the optimization and generates much better results.
  • ...and 6 more figures