There and Back Again: On the relation between Noise and Image Inversions in Diffusion Models
Łukasz Staniszewski, Łukasz Kuciński, Kamil Deja
TL;DR
This work analyzes DDIM inversion in diffusion models to reveal that early inversion steps produce biased, less diverse noise predictions in plain image regions, causing latents to deviate from Gaussian statistics and exhibit correlations. The authors show that these divergences reduce the manipulability of latent encodings for editing and interpolation. They propose a simple fix: replace the first few inversion steps with forward diffusion, which decorrelates latents and improves editing, interpolation quality, and stochastic editing of real images with minimal reconstruction cost. The approach is validated across multiple models and tasks, offering a practical method to enhance controllability of diffusion-based image editing and interpolation, while providing open-source code.
Abstract
Diffusion Models achieve state-of-the-art performance in generating new samples but lack a low-dimensional latent space that encodes the data into editable features. Inversion-based methods address this by reversing the denoising trajectory, transferring images to their approximated starting noise. In this work, we thoroughly analyze this procedure and focus on the relation between the initial noise, the generated samples, and their corresponding latent encodings obtained through the DDIM inversion. First, we show that latents exhibit structural patterns in the form of less diverse noise predicted for smooth image areas (e.g., plain sky). Through a series of analyses, we trace this issue to the first inversion steps, which fail to provide accurate and diverse noise. Consequently, the DDIM inversion space is notably less manipulative than the original noise. We show that prior inversion methods do not fully resolve this issue, but our simple fix, where we replace the first DDIM Inversion steps with a forward diffusion process, successfully decorrelates latent encodings and enables higher quality editions and interpolations. The code is available at https://github.com/luk-st/taba.
