Table of Contents
Fetching ...

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Junho Kim, Minjae Kim, Hyeonwoo Kang, Kwanghee Lee

TL;DR

U-GAT-IT introduces an unsupervised image-to-image translation framework that integrates a CAM-based attention mechanism into both generator and discriminator, guiding region-specific transformations. It further adopts AdaLIN, a learnable adaptive normalization that balances instance and layer statistics to flexibly control shape and texture changes. The approach demonstrates superior or competitive performance across diverse datasets without architectural changes or hyper-parameter tuning, supported by ablations on attention and normalization, qualitative assessments, and quantitative metrics like KID. The work contributes a robust, flexible method for translations requiring holistic and localized geometric changes, with publicly available code and datasets.

Abstract

We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based method which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters. Our code and datasets are available at https://github.com/taki0112/UGATIT or https://github.com/znxlwm/UGATIT-pytorch.

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

TL;DR

U-GAT-IT introduces an unsupervised image-to-image translation framework that integrates a CAM-based attention mechanism into both generator and discriminator, guiding region-specific transformations. It further adopts AdaLIN, a learnable adaptive normalization that balances instance and layer statistics to flexibly control shape and texture changes. The approach demonstrates superior or competitive performance across diverse datasets without architectural changes or hyper-parameter tuning, supported by ablations on attention and normalization, qualitative assessments, and quantitative metrics like KID. The work contributes a robust, flexible method for translations requiring holistic and localized geometric changes, with publicly available code and datasets.

Abstract

We propose a novel method for unsupervised image-to-image translation, which incorporates a new attention module and a new learnable normalization function in an end-to-end manner. The attention module guides our model to focus on more important regions distinguishing between source and target domains based on the attention map obtained by the auxiliary classifier. Unlike previous attention-based method which cannot handle the geometric changes between domains, our model can translate both images requiring holistic changes and images requiring large shape changes. Moreover, our new AdaLIN (Adaptive Layer-Instance Normalization) function helps our attention-guided model to flexibly control the amount of change in shape and texture by learned parameters depending on datasets. Experimental results show the superiority of the proposed method compared to the existing state-of-the-art models with a fixed network architecture and hyper-parameters. Our code and datasets are available at https://github.com/taki0112/UGATIT or https://github.com/znxlwm/UGATIT-pytorch.

Paper Structure

This paper contains 25 sections, 7 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: The model architecture of U-GAT-IT. The detailed notations are described in Section Model
  • Figure 2: Visualization of the attention maps and their effects shown in the ablation experiments: (a) Source images, (b) Attention map of the generator, (c-d) Local and global attention maps of the discriminator, respectively. (e) Our results with CAM, (f) Results without CAM.
  • Figure 3: Comparison of the results using each normalization function: (a) Source images, (b) Our results, (c) Results only using IN in decoder with CAM, (d) Results only using LN in decoder with CAM, (e) Results only using AdaIN in decoder with CAM, (f) Results only using GN in decoder with CAM.
  • Figure 4: Visual comparisons on the five datasets. From top to bottom: selfie2anime, horse2zebra, cat2dog, photo2portrait, and photo2vangogh. (a)Source images, (b)U-GAT-IT, (c)CycleGAN, (d)UNIT, (e)MUNIT, (f)DRIT, (g)AGGAN
  • Figure 5: Visual comparisons of the selfie2anime with attention features maps. (a) Source images, (b) Attention map of the generator, (c-d) Local and global attention maps of the discriminators, (e) Our results, (f) CycleGAN (ref31), (g) UNIT (ref18), (h) MUNIT (ref10), (i) DRIT (ref32), (j) AGGAN (ref41), (k) CartoonGAN (ref42).
  • ...and 7 more figures