TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

Ayushman Dash; John Cristian Borges Gamboa; Sheraz Ahmed; Marcus Liwicki; Muhammad Zeshan Afzal

TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

Ayushman Dash, John Cristian Borges Gamboa, Sheraz Ahmed, Marcus Liwicki, Muhammad Zeshan Afzal

TL;DR

TAC-GAN addresses text-to-image synthesis by conditioning the generator on text embeddings while training a text-aware discriminator, extending AC-GAN to use textual descriptions rather than class labels. Using Skip-Thought embeddings on the Oxford-102 Flowers dataset, it demonstrates improved discriminability (Inception Score up ~7.8% over StackGAN) and strong diversity (MS-SSIM metrics), while enabling content/style disentanglement and interpolation in both noise and text spaces. The approach is easily extensible to additional conditioning information and could be further enhanced with multi-stage refinement pipelines. Overall, TAC-GAN advances text-conditioned image synthesis by achieving higher-quality, more diverse, and semantically faithful outputs.

Abstract

In this work, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network, (TAC-GAN) a text to image Generative Adversarial Network (GAN) for synthesizing images from their text descriptions. Former approaches have tried to condition the generative process on the textual data; but allying it to the usage of class information, known to diversify the generated samples and improve their structural coherence, has not been explored. We trained the presented TAC-GAN model on the Oxford-102 dataset of flowers, and evaluated the discriminability of the generated images with Inception-Score, as well as their diversity using the Multi-Scale Structural Similarity Index (MS-SSIM). Our approach outperforms the state-of-the-art models, i.e., its inception score is 3.45, corresponding to a relative increase of 7.8% compared to the recently introduced StackGan. A comparison of the mean MS-SSIM scores of the training and generated samples per class shows that our approach is able to generate highly diverse images with an average MS-SSIM of 0.14 over all generated classes.

TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

TL;DR

Abstract

TAC-GAN - Text Conditioned Auxiliary Classifier Generative Adversarial Network

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)