From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning
Collin Guo, Yi Qian
TL;DR
This work addresses robust unpaired face manipulation by advancing from autoencoder baselines to a guided CycleGAN framework. It integrates spectral normalization, identity- and perceptual-guided losses, semantic and landmark-aware cycle weighting, and patchwise contrastive regularization, along with attention and multi-scale discriminators, to preserve identity and facial geometry across pose and illumination. Quantitative results show improvements in realism and perceptual quality (e.g., $FID$, $LPIPS$) and identity preservation ($ID\text{-}Sim$) over autoencoders, approaching pix2pix performance on curated paired subsets without requiring pairing. The proposed method demonstrates faster, more stable convergence and greater flexibility for unpaired face manipulation, offering a practical direction for scalable, production-ready systems with real-world applicability.
Abstract
Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.
