Table of Contents
Fetching ...

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

Xuekun Zhao, Pu Cao, Xiaoya Yang, Mingjian Zhang, Lu Yang, Qing Song

TL;DR

This work tackles the GAN inversion problem by showing that perception and editability hinge on how well embedded latent codes align with the GAN’s synthetic distribution. It introduces the Latent Space Alignment Inversion Paradigm (LSAP), which provides a unified alignment objective via the Normalized Style Space ($\mathcal{S^N}$) and the differentiable Normalized Style Space Cosine Distance (NSCD). LSAP yields encoder-based and optimization-based solutions (LSAP_E and LSAP_O) that improve perception and editability while maintaining fidelity, and it achieves state-of-the-art performance when integrated with existing refinement methods. The approach is validated across multiple domains, demonstrating robust improvements in both stages of inversion and offering a practical, distribution-aware framework for future GAN inversion research.

Abstract

As research on image inversion advances, the process is generally divided into two stages. The first step is Image Embedding, involves using an encoder or optimization procedure to embed an image and obtain its corresponding latent code. The second stage, referred to as Result Refinement, further improves the inversion and editing outcomes. Although this refinement stage substantially enhances reconstruction fidelity, perception and editability remain largely unchanged and are highly dependent on the latent codes derived from the first stage. Therefore, a key challenge lies in obtaining latent codes that preserve reconstruction fidelity while simultaneously improving perception and editability. In this work, we first reveal that these two properties are closely related to the degree of alignment (or disalignment) between the inverted latent codes and the synthetic distribution. Based on this insight, we propose the \textbf{ Latent Space Alignment Inversion Paradigm (LSAP)}, which integrates both an evaluation metric and a unified inversion solution. Specifically, we introduce the \textbf{Normalized Style Space ($\mathcal{S^N}$ space)} and \textbf{Normalized Style Space Cosine Distance (NSCD)} to quantify the disalignment of inversion methods. Moreover, our paradigm can be optimized for both encoder-based and optimization-based embeddings, providing a consistent alignment framework. Extensive experiments across various domains demonstrate that NSCD effectively captures perceptual and editable characteristics, and that our alignment paradigm achieves state-of-the-art performance in both stages of inversion.

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

TL;DR

This work tackles the GAN inversion problem by showing that perception and editability hinge on how well embedded latent codes align with the GAN’s synthetic distribution. It introduces the Latent Space Alignment Inversion Paradigm (LSAP), which provides a unified alignment objective via the Normalized Style Space () and the differentiable Normalized Style Space Cosine Distance (NSCD). LSAP yields encoder-based and optimization-based solutions (LSAP_E and LSAP_O) that improve perception and editability while maintaining fidelity, and it achieves state-of-the-art performance when integrated with existing refinement methods. The approach is validated across multiple domains, demonstrating robust improvements in both stages of inversion and offering a practical, distribution-aware framework for future GAN inversion research.

Abstract

As research on image inversion advances, the process is generally divided into two stages. The first step is Image Embedding, involves using an encoder or optimization procedure to embed an image and obtain its corresponding latent code. The second stage, referred to as Result Refinement, further improves the inversion and editing outcomes. Although this refinement stage substantially enhances reconstruction fidelity, perception and editability remain largely unchanged and are highly dependent on the latent codes derived from the first stage. Therefore, a key challenge lies in obtaining latent codes that preserve reconstruction fidelity while simultaneously improving perception and editability. In this work, we first reveal that these two properties are closely related to the degree of alignment (or disalignment) between the inverted latent codes and the synthetic distribution. Based on this insight, we propose the \textbf{ Latent Space Alignment Inversion Paradigm (LSAP)}, which integrates both an evaluation metric and a unified inversion solution. Specifically, we introduce the \textbf{Normalized Style Space ( space)} and \textbf{Normalized Style Space Cosine Distance (NSCD)} to quantify the disalignment of inversion methods. Moreover, our paradigm can be optimized for both encoder-based and optimization-based embeddings, providing a consistent alignment framework. Extensive experiments across various domains demonstrate that NSCD effectively captures perceptual and editable characteristics, and that our alignment paradigm achieves state-of-the-art performance in both stages of inversion.
Paper Structure (20 sections, 4 theorems, 17 equations, 10 figures, 4 tables)

This paper contains 20 sections, 4 theorems, 17 equations, 10 figures, 4 tables.

Key Result

Proposition 1

Suppose that $s=\{s_1, s_2, \dots, s_k\}$ is a set of $\mathcal{S}$ space latent codes and corresponding to image $x=G_\mathcal{S}(s)$. For $\forall a \in \mathbb{R}^+$ and $\forall l \in \{1, \cdots, k\}$, if $s^\prime=\{s^\prime_1, s^\prime_2, \dots, s^\prime_k\}$ follows: we have $x=G_\mathcal{S}(s)=G_\mathcal{S}(s^\prime)$.

Figures (10)

  • Figure 1: Inversion and editing results produced by LSAP and SAM$_{LSAP}$parmar2022spatially. Our method enhances image quality and editability while preserving reconstruction fidelity. It is compatible with the two-stage inversion framework and achieves better performance.
  • Figure 2: Illustration of Latent Space Distributions. We invert all images from the CelebA-HQ test split into the latent space and visualize their distribution in the $\mathcal{S}^\mathcal{N}$ space. Our alignment solution ensures that the embedded latent codes are located in the high-probability regions of the synthetic distribution, thereby preserving both perception and editability.
  • Figure 3: Alignment inversion solutions of LSAP. We show the details of encoder-based and optimization-based inversion methods in our alignment paradigm. The pivotal part is the $L_{NSCD}$, which represents the disalignment degree of inverse latent codes.
  • Figure 4: Inversion and editing results of encoder-based and two-stage inversion methods on face domain. We compare encoder-based, optimization-based, and two-stage approaches. LSAP$_E$ enhances perception and editability while maintaining fidelity, and HFGI$_{LSAP}$, SAM$_{LSAP}$ and PTI$_{LSAP}$ further reduce image distortion.
  • Figure 5: Editability effects of LSAP for optimization-based methods. LSAP enhances the editability of optimized latent codes and improves image quality in both the $\mathcal{W}$ and $\mathcal{W}^+$ spaces.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Proposition 1
  • proof
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Corollary 1