Regularization by Texts for Latent Diffusion Inverse Solvers
Jeongsol Kim, Geon Yeong Park, Hyungjin Chung, Jong Chul Ye
TL;DR
This paper introduces Regularization by Text (TReg), a zero-shot, text-conditioned latent diffusion framework for inverse problems that reduces ill-posedness by constraining the latent space with semantic priors. It couples a text-guided proximal objective with adaptive negation to sharpen semantic alignment while suppressing artifacts, and integrates latent DPS updates with updated null-text for improved data fidelity. Empirical results across linear and non-linear tasks (e.g., super-resolution, deblurring, Fourier phase retrieval, inpainting) show reduced ambiguity and improved alignment to textual cues, outperforming several baselines and demonstrating robustness across domains. The approach enables flexible, interpretable control over reconstructions and highlights how linguistic priors can guide image reconstruction in a principled, efficient manner.
Abstract
The recent development of diffusion models has led to significant progress in solving inverse problems by leveraging these models as powerful generative priors. However, challenges persist due to the ill-posed nature of such problems, often arising from ambiguities in measurements or intrinsic system symmetries. To address this, here we introduce a novel latent diffusion inverse solver, regularization by text (TReg), inspired by the human ability to resolve visual ambiguities through perceptual biases. TReg integrates textual descriptions of preconceptions about the solution during reverse diffusion sampling, dynamically reinforcing these descriptions through null-text optimization, which we refer to as adaptive negation. Our comprehensive experimental results demonstrate that TReg effectively mitigates ambiguity in inverse problems, improving both accuracy and efficiency.
