Table of Contents
Fetching ...

Regularization by Texts for Latent Diffusion Inverse Solvers

Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye

TL;DR

This paper introduces Regularization by Text (TReg), a zero-shot, text-conditioned latent diffusion framework for inverse problems that reduces ill-posedness by constraining the latent space with semantic priors. It couples a text-guided proximal objective with adaptive negation to sharpen semantic alignment while suppressing artifacts, and integrates latent DPS updates with updated null-text for improved data fidelity. Empirical results across linear and non-linear tasks (e.g., super-resolution, deblurring, Fourier phase retrieval, inpainting) show reduced ambiguity and improved alignment to textual cues, outperforming several baselines and demonstrating robustness across domains. The approach enables flexible, interpretable control over reconstructions and highlights how linguistic priors can guide image reconstruction in a principled, efficient manner.

Abstract

The recent development of diffusion models has led to significant progress in solving inverse problems by leveraging these models as powerful generative priors. However, challenges persist due to the ill-posed nature of such problems, often arising from ambiguities in measurements or intrinsic system symmetries. To address this, here we introduce a novel latent diffusion inverse solver, regularization by text (TReg), inspired by the human ability to resolve visual ambiguities through perceptual biases. TReg integrates textual descriptions of preconceptions about the solution during reverse diffusion sampling, dynamically reinforcing these descriptions through null-text optimization, which we refer to as adaptive negation. Our comprehensive experimental results demonstrate that TReg effectively mitigates ambiguity in inverse problems, improving both accuracy and efficiency.

Regularization by Texts for Latent Diffusion Inverse Solvers

TL;DR

This paper introduces Regularization by Text (TReg), a zero-shot, text-conditioned latent diffusion framework for inverse problems that reduces ill-posedness by constraining the latent space with semantic priors. It couples a text-guided proximal objective with adaptive negation to sharpen semantic alignment while suppressing artifacts, and integrates latent DPS updates with updated null-text for improved data fidelity. Empirical results across linear and non-linear tasks (e.g., super-resolution, deblurring, Fourier phase retrieval, inpainting) show reduced ambiguity and improved alignment to textual cues, outperforming several baselines and demonstrating robustness across domains. The approach enables flexible, interpretable control over reconstructions and highlights how linguistic priors can guide image reconstruction in a principled, efficient manner.

Abstract

The recent development of diffusion models has led to significant progress in solving inverse problems by leveraging these models as powerful generative priors. However, challenges persist due to the ill-posed nature of such problems, often arising from ambiguities in measurements or intrinsic system symmetries. To address this, here we introduce a novel latent diffusion inverse solver, regularization by text (TReg), inspired by the human ability to resolve visual ambiguities through perceptual biases. TReg integrates textual descriptions of preconceptions about the solution during reverse diffusion sampling, dynamically reinforcing these descriptions through null-text optimization, which we refer to as adaptive negation. Our comprehensive experimental results demonstrate that TReg effectively mitigates ambiguity in inverse problems, improving both accuracy and efficiency.
Paper Structure (33 sections, 23 equations, 21 figures, 6 tables, 2 algorithms)

This paper contains 33 sections, 23 equations, 21 figures, 6 tables, 2 algorithms.

Figures (21)

  • Figure 1: Representative solutions obtained by TReg for various inverse problems. TReg optimizes both data consistency and the semantic alignment of the solution with textual cues, by reducing the solution space with text-conditional latent regularizer. This serves as an effective semantic guidance throughout the reconstruction process.
  • Figure 2: (a) Concept of adaptive negation. Compared to concept emphasize and negation which targets a specific concept, the adaptive negation tries to suppress concepts except the desired one. (b) Adaptive negation is crucial to avoid artifacts on reconstruction.
  • Figure 3: TReg effectively reduce ambiguity of solution with text-based regularization. (a) Given measurement and text description. (b) Multiple reconstructions and pixel-wise variance without and with text regularization. (c) Variance measured over white dotted line on uncertainty map.
  • Figure 4: Reconstruction by TReg where the original class is given as text description: "A photo of <class>."
  • Figure 5: Reconstructions when given text prompt differs from the original class.
  • ...and 16 more figures