Table of Contents
Fetching ...

Repulsive Latent Score Distillation for Solving Inverse Problems

Nicolas Zilberstein, Morteza Mardani, Santiago Segarra

TL;DR

This work proposes a multimodal variational approximation with a repulsion mechanism that promotes diversity among particles by penalizing pairwise kernel-based similarity and extends this framework with an augmented variational distribution that disentangles the latent and data.

Abstract

Score Distillation Sampling (SDS) has been pivotal for leveraging pre-trained diffusion models in downstream tasks such as inverse problems, but it faces two major challenges: $(i)$ mode collapse and $(ii)$ latent space inversion, which become more pronounced in high-dimensional data. To address mode collapse, we introduce a novel variational framework for posterior sampling. Utilizing the Wasserstein gradient flow interpretation of SDS, we propose a multimodal variational approximation with a repulsion mechanism that promotes diversity among particles by penalizing pairwise kernel-based similarity. This repulsion acts as a simple regularizer, encouraging a more diverse set of solutions. To mitigate latent space ambiguity, we extend this framework with an augmented variational distribution that disentangles the latent and data. This repulsive augmented formulation balances computational efficiency, quality, and diversity. Extensive experiments on linear and nonlinear inverse tasks with high-resolution images ($512 \times 512$) using pre-trained Stable Diffusion models demonstrate the effectiveness of our approach.

Repulsive Latent Score Distillation for Solving Inverse Problems

TL;DR

This work proposes a multimodal variational approximation with a repulsion mechanism that promotes diversity among particles by penalizing pairwise kernel-based similarity and extends this framework with an augmented variational distribution that disentangles the latent and data.

Abstract

Score Distillation Sampling (SDS) has been pivotal for leveraging pre-trained diffusion models in downstream tasks such as inverse problems, but it faces two major challenges: mode collapse and latent space inversion, which become more pronounced in high-dimensional data. To address mode collapse, we introduce a novel variational framework for posterior sampling. Utilizing the Wasserstein gradient flow interpretation of SDS, we propose a multimodal variational approximation with a repulsion mechanism that promotes diversity among particles by penalizing pairwise kernel-based similarity. This repulsion acts as a simple regularizer, encouraging a more diverse set of solutions. To mitigate latent space ambiguity, we extend this framework with an augmented variational distribution that disentangles the latent and data. This repulsive augmented formulation balances computational efficiency, quality, and diversity. Extensive experiments on linear and nonlinear inverse tasks with high-resolution images () using pre-trained Stable Diffusion models demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 67 sections, 41 equations, 32 figures, 10 tables, 2 algorithms.

Figures (32)

  • Figure 1: Illustration of Repulsive Latent Score Distillation (RLSD): It propagates a set of particles by adding noise and applying two levels of regularization: ($i$) Denoising, via score-matching regularization, which directs particles toward modes of the distribution $p({\mathbf x}_0|{\mathbf y})$ (blue arrows); and ($ii$) Repulsion, which pushes particles apart (red arrows) to explore other regions of the posterior density. During sampling, the repulsion gradient ensures particles remain separated, leading to different modes, as shown in the upper-right box.
  • Figure 2: Inpainting half face using (from top to bottom): Ground truth and Measurement, PSLD, NonRepuls-RLSD, and RLSD ($\gamma = 50$). We generate four samples for each method, starting from different initializations. For RLSD, the samples interact through the repulsion term. First, both NonRepuls-RLSD and RLSD outperform PSLD across all four images. Second, while images 1, 3 and 2, 4 from RLSD differ noticeably (e.g., images 1 and 3 have the left eye hidden), all samples generated by NonRepuls-RLSD appear quite similar.
  • Figure 3: Results for Phase Retrieval. Adding repulsion between particles allows to sample from different modes (top row).
  • Figure 4: Inpainting half face using (from top to bottom): Ground truth and Measurement, PSLD, NonRepuls-RLSD, and RLSD ($\gamma = 50$). We generate four samples for each method from a different initialization; for RLSD, they interact through the repulsion term. First, NonRepuls-RLSD, and RLSD outperforms PSLD for all four images. Second, while images 1 and 3 from RLSD (last row) look different, the images 1 and 3 of NonRepuls-RLSD are similar; this illustrates that RLSD promotes diversity.
  • Figure 5: Phase Retrieval. Adding repulsion between particles promotes diversity, and allows to sample from different modes.
  • ...and 27 more figures