Stacked Generative Adversarial Networks
Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, Serge Belongie
TL;DR
SGAN introduces a top-down stack of GANs that inverts a pre-trained discriminative encoder by enforcing adversarial alignment of intermediate representations through representation discriminators. It adds a conditional loss to preserve higher-level conditioning and an entropy loss to promote diverse outputs via a variational lower bound on H(hat{h}_i|h_{i+1}). Training proceeds from independent per-stack objectives to end-to-end joint optimization, enabling hierarchical decomposition of variation and conditioning on class labels. On MNIST, SVHN, and CIFAR-10, SGAN achieves higher image quality and diversity than vanilla GAN variants, with state-of-the-art Inception scores on CIFAR-10 and strong human-perceived realism in Visual Turing Tests. The work demonstrates that leveraging hierarchical discriminative representations can substantially improve generative modeling while enhancing interpretability through multi-level latent structure.
Abstract
In this paper, we propose a novel generative model named Stacked Generative Adversarial Networks (SGAN), which is trained to invert the hierarchical representations of a bottom-up discriminative network. Our model consists of a top-down stack of GANs, each learned to generate lower-level representations conditioned on higher-level representations. A representation discriminator is introduced at each feature hierarchy to encourage the representation manifold of the generator to align with that of the bottom-up discriminative network, leveraging the powerful discriminative representations to guide the generative model. In addition, we introduce a conditional loss that encourages the use of conditional information from the layer above, and a novel entropy loss that maximizes a variational lower bound on the conditional entropy of generator outputs. We first train each stack independently, and then train the whole model end-to-end. Unlike the original GAN that uses a single noise vector to represent all the variations, our SGAN decomposes variations into multiple levels and gradually resolves uncertainties in the top-down generative process. Based on visual inspection, Inception scores and visual Turing test, we demonstrate that SGAN is able to generate images of much higher quality than GANs without stacking.
