An Iteration-Free Fixed-Point Estimator for Diffusion Inversion
Yifei Chen, Kaiyu Song, Yan Pan, Jianxing Yu, Jian Yin, Hanjiang Lai
TL;DR
The paper tackles the problem of diffusion inversion efficiency by removing iterative refinements per inversion step. It derives an explicit fixed-point expression for the unknown latent and couples it with an error-approximation from prior steps to produce an iteration-free estimator (IFE) that is unbiased with low variance. The method achieves reconstruction quality on par with multi-iteration FP methods while substantially reducing computation, as demonstrated on MS-COCO and NOCAPS. This offers a practical, training-free solution to high-fidelity diffusion inversion, with potential applicability to a range of inversion-based downstream tasks. The authors also acknowledge the current focus on DDIM and point to future work on stochastic samplers and broader inversion paradigms.
Abstract
Diffusion inversion aims to recover the initial noise corresponding to a given image such that this noise can reconstruct the original image through the denoising diffusion process. The key component of diffusion inversion is to minimize errors at each inversion step, thereby mitigating cumulative inaccuracies. Recently, fixed-point iteration has emerged as a widely adopted approach to minimize reconstruction errors at each inversion step. However, it suffers from high computational costs due to its iterative nature and the complexity of hyperparameter selection. To address these issues, we propose an iteration-free fixed-point estimator for diffusion inversion. First, we derive an explicit expression of the fixed point from an ideal inversion step. Unfortunately, it inherently contains an unknown data prediction error. Building upon this, we introduce the error approximation, which uses the calculable error from the previous inversion step to approximate the unknown error at the current inversion step. This yields a calculable, approximate expression for the fixed point, which is an unbiased estimator characterized by low variance, as shown by our theoretical analysis. We evaluate reconstruction performance on two text-image datasets, NOCAPS and MS-COCO. Compared to DDIM inversion and other inversion methods based on the fixed-point iteration, our method achieves consistent and superior performance in reconstruction tasks without additional iterations or training.
