Improving Deep Generative Models on Many-To-One Image-to-Image Translation
Sagar Saxena, Mohammad Nayeem Teli
TL;DR
This work addresses the challenge of many-to-one image-to-image translation by proposing an asymmetric optimization framework that respects domain asymmetry. The authors adapt StarGAN V2 with HMU/HMS modifications, include domain-specific encoding/decoding paths and weight demodulation, and introduce the Colorized MNIST dataset with the Color Recall metric to provide interpretable evaluation of diversity and fidelity. Empirical results on Colorized MNIST and ADE20K show improved joint performance across uni-modal and multi-modal domains, albeit with some limitations in semantic segmentation tasks. The contributions offer practical guidelines for designing deep generative models that handle asymmetric domain relationships and provide interpretable benchmarks for future work in this area.
Abstract
Deep generative models have been applied to multiple applications in image-to-image translation. Generative Adversarial Networks and Diffusion Models have presented impressive results, setting new state-of-the-art results on these tasks. Most methods have symmetric setups across the different domains in a dataset. These methods assume that all domains have either multiple modalities or only one modality. However, there are many datasets that have a many-to-one relationship between two domains. In this work, we first introduce a Colorized MNIST dataset and a Color-Recall score that can provide a simple benchmark for evaluating models on many-to-one translation. We then introduce a new asymmetric framework to improve existing deep generative models on many-to-one image-to-image translation. We apply this framework to StarGAN V2 and show that in both unsupervised and semi-supervised settings, the performance of this new model improves on many-to-one image-to-image translation.
