Table of Contents
Fetching ...

From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning

Collin Guo, Yi Qian

TL;DR

This work addresses robust unpaired face manipulation by advancing from autoencoder baselines to a guided CycleGAN framework. It integrates spectral normalization, identity- and perceptual-guided losses, semantic and landmark-aware cycle weighting, and patchwise contrastive regularization, along with attention and multi-scale discriminators, to preserve identity and facial geometry across pose and illumination. Quantitative results show improvements in realism and perceptual quality (e.g., $FID$, $LPIPS$) and identity preservation ($ID\text{-}Sim$) over autoencoders, approaching pix2pix performance on curated paired subsets without requiring pairing. The proposed method demonstrates faster, more stable convergence and greater flexibility for unpaired face manipulation, offering a practical direction for scalable, production-ready systems with real-world applicability.

Abstract

Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.

From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning

TL;DR

This work addresses robust unpaired face manipulation by advancing from autoencoder baselines to a guided CycleGAN framework. It integrates spectral normalization, identity- and perceptual-guided losses, semantic and landmark-aware cycle weighting, and patchwise contrastive regularization, along with attention and multi-scale discriminators, to preserve identity and facial geometry across pose and illumination. Quantitative results show improvements in realism and perceptual quality (e.g., , ) and identity preservation () over autoencoders, approaching pix2pix performance on curated paired subsets without requiring pairing. The proposed method demonstrates faster, more stable convergence and greater flexibility for unpaired face manipulation, offering a practical direction for scalable, production-ready systems with real-world applicability.

Abstract

Human face synthesis and manipulation are increasingly important in entertainment and AI, with a growing demand for highly realistic, identity-preserving images even when only unpaired, unaligned datasets are available. We study unpaired face manipulation via adversarial learning, moving from autoencoder baselines to a robust, guided CycleGAN framework. While autoencoders capture coarse identity, they often miss fine details. Our approach integrates spectral normalization for stable training, identity- and perceptual-guided losses to preserve subject identity and high-level structure, and landmark-weighted cycle constraints to maintain facial geometry across pose and illumination changes. Experiments show that our adversarial trained CycleGAN improves realism (FID), perceptual quality (LPIPS), and identity preservation (ID-Sim) over autoencoders, with competitive cycle-reconstruction SSIM and practical inference times, which achieved high quality without paired datasets and approaching pix2pix on curated paired subsets. These results demonstrate that guided, spectrally normalized CycleGANs provide a practical path from autoencoders to robust unpaired face manipulation.

Paper Structure

This paper contains 19 sections, 13 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: GAN architecture and compute process.
  • Figure 2: Deep face manipulation examples from the 2024 U.S. Presidential Debate.
  • Figure 3: CycleGAN architecture used in our experiments: 9-block ResNet generators with skip connections and optional attention; PatchGAN discriminators with spectral normalization and multi-scale evaluation.
  • Figure 4: pix2pix results with landmark-based conditioning. Clear images are generated but fail under strong pose mismatches.
  • Figure 5: Examples of autoencoder-based manipulation: masks, feature synthesis, and face swapping.
  • ...and 2 more figures