Spectral Collapse in Diffusion Inversion
Nicolas Bourriez, Alexandre Verine, Auguste Genovesio
TL;DR
The paper identifies spectral collapse as a key bottleneck in deterministic diffusion inversion when translating spectrally sparse inputs to spectrally dense targets. It analyzes how inversion dynamics and the choice of prediction target (ε vs $\mathbf{x}_0$) influence latent statistics, showing that naive approaches fail to recover high-frequency textures. To address this, the authors propose Orthogonal Variance Guidance (OVG), which injects high-frequency variance while constraining updates to the null-space of the structural gradient, achieving both texture realism and structural fidelity. Extensive experiments on BBBC021 and Edges2Shoes demonstrate that EDM+OVG yields superior perceptual texture quality and robust structure preservation, expanding the Pareto frontier beyond prior deterministic or stochastic methods. The work provides theoretical insights into the spectral properties of diffusion inversion and a practical, inference-time tool for unpaired image translation across spectrally asymmetric domains.
Abstract
Conditional diffusion inversion provides a powerful framework for unpaired image-to-image translation. However, we demonstrate through an extensive analysis that standard deterministic inversion (e.g. DDIM) fails when the source domain is spectrally sparse compared to the target domain (e.g., super-resolution, sketch-to-image). In these contexts, the recovered latent from the input does not follow the expected isotropic Gaussian distribution. Instead it exhibits a signal with lower frequencies, locking target sampling to oversmoothed and texture-poor generations. We term this phenomenon spectral collapse. We observe that stochastic alternatives attempting to restore the noise variance tend to break the semantic link to the input, leading to structural drift. To resolve this structure-texture trade-off, we propose Orthogonal Variance Guidance (OVG), an inference-time method that corrects the ODE dynamics to enforce the theoretical Gaussian noise magnitude within the null-space of the structural gradient. Extensive experiments on microscopy super-resolution (BBBC021) and sketch-to-image (Edges2Shoes) demonstrate that OVG effectively restores photorealistic textures while preserving structural fidelity.
