Table of Contents
Fetching ...

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim

TL;DR

This work tackles unsupervised discovery of cross-domain relations from unpaired image datasets by proposing DiscoGAN, a dual-GAN framework with reconstruction losses that enforce bidirectional, bijective mappings between two domains. By coupling two GANs and introducing two cycle-consistency reconstruction terms, DiscoGAN avoids mode collapse and achieves robust, invertible translations across diverse domain pairs. The approach is validated on toy and real-domain tasks (e.g., car, face, chair, edges-to-photos, handbags↔shoes), demonstrating accurate, attribute-preserving cross-domain mappings without explicit pair labels. Overall, DiscoGAN enables natural style transfer and relation discovery in settings where paired data is scarce or unavailable, with broad applicability to image-to-image translation and multi-domain learning.

Abstract

While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity. Source code for official implementation is publicly available https://github.com/SKTBrain/DiscoGAN

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

TL;DR

This work tackles unsupervised discovery of cross-domain relations from unpaired image datasets by proposing DiscoGAN, a dual-GAN framework with reconstruction losses that enforce bidirectional, bijective mappings between two domains. By coupling two GANs and introducing two cycle-consistency reconstruction terms, DiscoGAN avoids mode collapse and achieves robust, invertible translations across diverse domain pairs. The approach is validated on toy and real-domain tasks (e.g., car, face, chair, edges-to-photos, handbags↔shoes), demonstrating accurate, attribute-preserving cross-domain mappings without explicit pair labels. Overall, DiscoGAN enables natural style transfer and relation discovery in settings where paired data is scarce or unavailable, with broad applicability to image-to-image translation and multi-domain learning.

Abstract

While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity. Source code for official implementation is publicly available https://github.com/SKTBrain/DiscoGAN

Paper Structure

This paper contains 16 sections, 4 equations, 9 figures.

Figures (9)

  • Figure 1: Our GAN-based model trains with two independently collected sets of images and learns how to map two domains without any extra label. In this paper, we reduces this problem into generating a new image of one domain given an image from the other domain. (a) shows a high-level overview of the training procedure of our model with two independent sets (e.g. handbag images and shoe images). (b) and (c) show results of our method. Our method takes a handbag (or shoe) image as an input, and generates its corresponding shoe (or handbag) image. Again, it's worth noting that our method does not take any extra annotated supervision and can self-discover relations between domains.
  • Figure 2: Three investigated models. (a) standard GAN goodfellow2014generative, (b) GAN with a reconstruction loss, (c) our proposed model (DiscoGAN) designed to discover relations between two unpaired, unlabeled datasets. Details are described in Section 3.
  • Figure 3: Illustration of our models on simplified one dimensional domains. (a) ideal mapping from domain A to domain B in which the two domain A modes map to two different domain B modes, (b) GAN model failure case, (c) GAN with reconstruction model failure case.
  • Figure 4: Toy domain experiment results. The colored background shows the output value of the discriminator. 'x' marks denote different modes in B domain, and colored circles indicate mapped samples of domain A to domain B, where each color corresponds to a different mode. (a) ten target domain modes and initial translations, (b) standard GAN model, (c) GAN with reconstruction loss, (d) our proposed model DiscoGAN
  • Figure 5: Car to Car translation experiment. Horizontal and vertical axes in the plots indicate predicted azimuth angles of input and translated images, where the angle of input image ranges from -75$^\circ$ to 75$^\circ$. RMSE with respect to ground truth (blue lines) are shown in each plot. Images in the second row are examples of input car images ranging from -75$^\circ$ to 75$^\circ$ at 15$^\circ$ intervals. Images in the third row are corresponding translated images. (a) plot of standard GAN (b) GAN with reconstruction (c) DiscoGAN. The angles of input and output images are highly correlated when our proposed DiscoGAN model is used. Note the angles of input and translated car images are reversed with respect to 0$^\circ$ (i.e. mirror images).
  • ...and 4 more figures