LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space
Xuekun Zhao, Pu Cao, Xiaoya Yang, Mingjian Zhang, Lu Yang, Qing Song
TL;DR
This work tackles the GAN inversion problem by showing that perception and editability hinge on how well embedded latent codes align with the GAN’s synthetic distribution. It introduces the Latent Space Alignment Inversion Paradigm (LSAP), which provides a unified alignment objective via the Normalized Style Space ($\mathcal{S^N}$) and the differentiable Normalized Style Space Cosine Distance (NSCD). LSAP yields encoder-based and optimization-based solutions (LSAP_E and LSAP_O) that improve perception and editability while maintaining fidelity, and it achieves state-of-the-art performance when integrated with existing refinement methods. The approach is validated across multiple domains, demonstrating robust improvements in both stages of inversion and offering a practical, distribution-aware framework for future GAN inversion research.
Abstract
As research on image inversion advances, the process is generally divided into two stages. The first step is Image Embedding, involves using an encoder or optimization procedure to embed an image and obtain its corresponding latent code. The second stage, referred to as Result Refinement, further improves the inversion and editing outcomes. Although this refinement stage substantially enhances reconstruction fidelity, perception and editability remain largely unchanged and are highly dependent on the latent codes derived from the first stage. Therefore, a key challenge lies in obtaining latent codes that preserve reconstruction fidelity while simultaneously improving perception and editability. In this work, we first reveal that these two properties are closely related to the degree of alignment (or disalignment) between the inverted latent codes and the synthetic distribution. Based on this insight, we propose the \textbf{ Latent Space Alignment Inversion Paradigm (LSAP)}, which integrates both an evaluation metric and a unified inversion solution. Specifically, we introduce the \textbf{Normalized Style Space ($\mathcal{S^N}$ space)} and \textbf{Normalized Style Space Cosine Distance (NSCD)} to quantify the disalignment of inversion methods. Moreover, our paradigm can be optimized for both encoder-based and optimization-based embeddings, providing a consistent alignment framework. Extensive experiments across various domains demonstrate that NSCD effectively captures perceptual and editable characteristics, and that our alignment paradigm achieves state-of-the-art performance in both stages of inversion.
