Table of Contents
Fetching ...

Image Inversion: A Survey from GANs to Diffusion and Beyond

Yinan Chen, Jiangning Zhang, Yali Bi, Xiaobin Hu, Teng Hu, Zhucun Xue, Ran Yi, Yong Liu, Ying Tai

TL;DR

Given a real image $x_0$, the goal is to recover latent representations enabling editing via pretrained generators; GAN inversion seeks $z^*$ with $x = G(z^*) \approx x_0$, while diffusion inversion targets $z_T^*$ via $z_T^* = \underset{z_T}{\\mathrm{argmin}} \mathcal{L}(z_T, x_0)$. The survey classifies GAN inversions into encoder-based, latent optimization, and hybrid, and diffusion inversions into training-free, fine-tuning, and extra trainable-module strategies, while highlighting emerging techniques such as DiT and Rectified Flow. It provides a unified framework for method comparison, synthesizes key techniques, and discusses applications and open problems with directions for future research. The work aims to guide practitioners toward balancing high-fidelity reconstruction with editable control in image inversion and its extensions beyond images.

Abstract

Image inversion is a fundamental task in generative models, aiming to map images back to their latent representations to enable downstream applications such as editing, restoration, and style transfer. This paper provides a comprehensive review of the latest advancements in image inversion techniques, focusing on two main paradigms: Generative Adversarial Network (GAN) inversion and diffusion model inversion. We categorize these techniques based on their optimization methods. For GAN inversion, we systematically classify existing methods into encoder-based approaches, latent optimization approaches, and hybrid approaches, analyzing their theoretical foundations, technical innovations, and practical trade-offs. For diffusion model inversion, we explore training-free strategies, fine-tuning methods, and the design of additional trainable modules, highlighting their unique advantages and limitations. Additionally, we discuss several popular downstream applications and emerging applications beyond image tasks, identifying current challenges and future research directions. By synthesizing the latest developments, this paper aims to provide researchers and practitioners with a valuable reference resource, promoting further advancements in the field of image inversion. We keep track of the latest works at https://github.com/RyanChenYN/ImageInversion

Image Inversion: A Survey from GANs to Diffusion and Beyond

TL;DR

Given a real image , the goal is to recover latent representations enabling editing via pretrained generators; GAN inversion seeks with , while diffusion inversion targets via . The survey classifies GAN inversions into encoder-based, latent optimization, and hybrid, and diffusion inversions into training-free, fine-tuning, and extra trainable-module strategies, while highlighting emerging techniques such as DiT and Rectified Flow. It provides a unified framework for method comparison, synthesizes key techniques, and discusses applications and open problems with directions for future research. The work aims to guide practitioners toward balancing high-fidelity reconstruction with editable control in image inversion and its extensions beyond images.

Abstract

Image inversion is a fundamental task in generative models, aiming to map images back to their latent representations to enable downstream applications such as editing, restoration, and style transfer. This paper provides a comprehensive review of the latest advancements in image inversion techniques, focusing on two main paradigms: Generative Adversarial Network (GAN) inversion and diffusion model inversion. We categorize these techniques based on their optimization methods. For GAN inversion, we systematically classify existing methods into encoder-based approaches, latent optimization approaches, and hybrid approaches, analyzing their theoretical foundations, technical innovations, and practical trade-offs. For diffusion model inversion, we explore training-free strategies, fine-tuning methods, and the design of additional trainable modules, highlighting their unique advantages and limitations. Additionally, we discuss several popular downstream applications and emerging applications beyond image tasks, identifying current challenges and future research directions. By synthesizing the latest developments, this paper aims to provide researchers and practitioners with a valuable reference resource, promoting further advancements in the field of image inversion. We keep track of the latest works at https://github.com/RyanChenYN/ImageInversion

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Left: Diagrammatic overview of the general formulaic description of image inversion (I), as well as its instantiation in GAN (II) and Diffusion (III) frameworks. Right: Works summarization on different frameworks in recent years. Only the works from the past four years are listed. Due to the superior performance of diffusion models, the interest in GAN-based work has been declining annually.
  • Figure 2: A taxonomy of generative model inversion approaches from GANs to Diffusion and beyond.