Table of Contents
Fetching ...

Improving Deep Generative Models on Many-To-One Image-to-Image Translation

Sagar Saxena, Mohammad Nayeem Teli

TL;DR

This work addresses the challenge of many-to-one image-to-image translation by proposing an asymmetric optimization framework that respects domain asymmetry. The authors adapt StarGAN V2 with HMU/HMS modifications, include domain-specific encoding/decoding paths and weight demodulation, and introduce the Colorized MNIST dataset with the Color Recall metric to provide interpretable evaluation of diversity and fidelity. Empirical results on Colorized MNIST and ADE20K show improved joint performance across uni-modal and multi-modal domains, albeit with some limitations in semantic segmentation tasks. The contributions offer practical guidelines for designing deep generative models that handle asymmetric domain relationships and provide interpretable benchmarks for future work in this area.

Abstract

Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation.

Improving Deep Generative Models on Many-To-One Image-to-Image Translation

TL;DR

This work addresses the challenge of many-to-one image-to-image translation by proposing an asymmetric optimization framework that respects domain asymmetry. The authors adapt StarGAN V2 with HMU/HMS modifications, include domain-specific encoding/decoding paths and weight demodulation, and introduce the Colorized MNIST dataset with the Color Recall metric to provide interpretable evaluation of diversity and fidelity. Empirical results on Colorized MNIST and ADE20K show improved joint performance across uni-modal and multi-modal domains, albeit with some limitations in semantic segmentation tasks. The contributions offer practical guidelines for designing deep generative models that handle asymmetric domain relationships and provide interpretable benchmarks for future work in this area.

Abstract

Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation.
Paper Structure (15 sections, 5 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 15 sections, 5 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: Different relationships between domains on trans-attr
  • Figure 2: 5 paired samples from the Colorized MNIST dataset.
  • Figure 3: 5 samples from the ADE20K training set.
  • Figure 4: Red, Green, and Blue, Color Recall Histograms. The black dashed line represents the real uniform distribution of color values.
  • Figure 5: 5 randomly sampled translations on the test set from Colorized-MNIST images to MNIST images (left) and MNIST images to Colorized MNIST images (right).
  • ...and 1 more figures