Improving generative adversarial network inversion via fine-tuning GAN encoders
Cheng Yu, Wenmin Wang, Roberto Bugiolacchi
TL;DR
The paper tackles the challenge of inverting real images across diverse GAN architectures, where existing methods largely specialize to StyleGAN. It proposes a self-supervised pre-training of encoders with an adaptive block and cropping-attention losses, followed by fine-tuning on real images via latent-regularization. Across StyleGAN, PGGAN, and BigGAN, the approach achieves superior inversion quality and enables real-face editing, outperforming state-of-the-art baselines. Ablation studies confirm the importance of cropping attentions and SSIM, while noting limitations in preserving fine accessories due to current generator capabilities.
Abstract
Generative adversarial networks (GANs) can synthesize high-quality (HQ) images, and GAN inversion is a technique that discovers how to invert given images back to latent space. While existing methods perform on StyleGAN inversion, they have limited performance and are not generalized to different GANs. To address these issues, we proposed a self-supervised method to pre-train and fine-tune GAN encoders. First, we designed an adaptive block to fit different encoder architectures for inverting diverse GANs. Then we pre-train GAN encoders using synthesized images and emphasize local regions through cropping images. Finally, we fine-tune the pre-trained GAN encoder for inverting real images. Compared with state-of-the-art methods, our method achieved better results that reconstructed high-quality images on mainstream GANs. Our code and pre-trained models are available at: https://github.com/disanda/Deep-GAN-Encoders.
