Table of Contents
Fetching ...

On Unsupervised Image-to-image translation and GAN stability

BahaaEddin AlAila, Zahra Jandaghi, Abolfazl Farahani, Mohammad Ziad Al-Saad

TL;DR

This work analyzes failure modes of CycleGAN and argues that unsupervised image-to-image translation is ill-posed due to manifold alignment and mode-collapse risks. It proposes two general approaches to increase stability and reduce reliance on dual-GAN architectures: a compact 1-GAN model with a Wasserstein loss and a GAN-free framework based on β-VAE/Sinkhorn autoencoders with cycle constraints. Across toy MNIST tasks and preliminary Cityscapes experiments, the authors observe that while cycle-consistent translations can be achieved, the mappings often collapse to arbitrary, domain-like variations, revealing persistent manifold alignment and prior-hole issues. The work highlights the need for stronger constraints and proposes directions such as attention mechanisms, refined priors (e.g., VampPrior), and operator-centric encodings to improve both stability and fidelity in unsupervised I2I translation, with potential for more compact architectures than CycleGAN. The practical impact lies in guiding the design of stable, data-efficient translation systems for diverse vision tasks while clarifying fundamental limitations of current unsupervised approaches.

Abstract

The problem of image-to-image translation is one that is intruiging and challenging at the same time, for the impact potential it can have on a wide variety of other computer vision applications like colorization, inpainting, segmentation and others. Given the high-level of sophistication needed to extract patterns from one domain and successfully applying them to another, especially, in a completely unsupervised (unpaired) manner, this problem has gained much attention as of the last few years. It is one of the first problems where successful applications to deep generative models, and especially Generative Adversarial Networks achieved astounding results that are actually of realworld impact, rather than just a show of theoretical prowess; the such that has been dominating the GAN world. In this work, we study some of the failure cases of a seminal work in the field, CycleGAN [1] and hypothesize that they are GAN-stability related, and propose two general models to try to alleviate these problems. We also reach the same conclusion of the problem being ill-posed that has been also circulating in the literature lately.

On Unsupervised Image-to-image translation and GAN stability

TL;DR

This work analyzes failure modes of CycleGAN and argues that unsupervised image-to-image translation is ill-posed due to manifold alignment and mode-collapse risks. It proposes two general approaches to increase stability and reduce reliance on dual-GAN architectures: a compact 1-GAN model with a Wasserstein loss and a GAN-free framework based on β-VAE/Sinkhorn autoencoders with cycle constraints. Across toy MNIST tasks and preliminary Cityscapes experiments, the authors observe that while cycle-consistent translations can be achieved, the mappings often collapse to arbitrary, domain-like variations, revealing persistent manifold alignment and prior-hole issues. The work highlights the need for stronger constraints and proposes directions such as attention mechanisms, refined priors (e.g., VampPrior), and operator-centric encodings to improve both stability and fidelity in unsupervised I2I translation, with potential for more compact architectures than CycleGAN. The practical impact lies in guiding the design of stable, data-efficient translation systems for diverse vision tasks while clarifying fundamental limitations of current unsupervised approaches.

Abstract

The problem of image-to-image translation is one that is intruiging and challenging at the same time, for the impact potential it can have on a wide variety of other computer vision applications like colorization, inpainting, segmentation and others. Given the high-level of sophistication needed to extract patterns from one domain and successfully applying them to another, especially, in a completely unsupervised (unpaired) manner, this problem has gained much attention as of the last few years. It is one of the first problems where successful applications to deep generative models, and especially Generative Adversarial Networks achieved astounding results that are actually of realworld impact, rather than just a show of theoretical prowess; the such that has been dominating the GAN world. In this work, we study some of the failure cases of a seminal work in the field, CycleGAN [1] and hypothesize that they are GAN-stability related, and propose two general models to try to alleviate these problems. We also reach the same conclusion of the problem being ill-posed that has been also circulating in the literature lately.
Paper Structure (25 sections, 10 equations, 15 figures, 1 algorithm)

This paper contains 25 sections, 10 equations, 15 figures, 1 algorithm.

Figures (15)

  • Figure 1: Failure instances reported by CycleGAN60
  • Figure 2: The high-level structure of our 1-GAN model
  • Figure 3: The Sequential $\beta$-VAE model
  • Figure 4: The Interleaving $\beta$-VAE model
  • Figure 5: The Aligned $\beta$-VAE encoding model
  • ...and 10 more figures