Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing
Yangyang Xu, Wenqi Shao, Yong Du, Haiming Zhu, Yang Zhou, Ping Luo, Shengfeng He
TL;DR
TODInv introduces Task-Oriented Diffusion Inversion, a framework that inverts real images and edits them by optimizing prompt embeddings in the extended space $\mathcal{P}^*$ across U-Net layers and timesteps. By categorizing edits into structure, appearance, and global, TODInv updates only embeddings irrelevant to the current edit, balancing high reconstruction fidelity with precise editability. Empirical results on PIE-Bench and experiments with few-step diffusion models demonstrate superior reconstruction quality and editing performance over state-of-the-art inversion methods, while maintaining efficiency. The approach provides a principled pathway to reliable, controllable text-based editing of real images, with practical applicability across diverse editing tools and diffusion backbones.
Abstract
Recent advancements in text-guided diffusion models have unlocked powerful image manipulation capabilities, yet balancing reconstruction fidelity and editability for real images remains a significant challenge. In this work, we introduce \textbf{T}ask-\textbf{O}riented \textbf{D}iffusion \textbf{I}nversion (\textbf{TODInv}), a novel framework that inverts and edits real images tailored to specific editing tasks by optimizing prompt embeddings within the extended \(\mathcal{P}^*\) space. By leveraging distinct embeddings across different U-Net layers and time steps, TODInv seamlessly integrates inversion and editing through reciprocal optimization, ensuring both high fidelity and precise editability. This hierarchical editing mechanism categorizes tasks into structure, appearance, and global edits, optimizing only those embeddings unaffected by the current editing task. Extensive experiments on benchmark dataset reveal TODInv's superior performance over existing methods, delivering both quantitative and qualitative enhancements while showcasing its versatility with few-step diffusion model.
