Table of Contents
Fetching ...

Discriminative Regularization for Generative Models

Alex Lamb, Vincent Dumoulin, Aaron Courville

TL;DR

The paper tackles improving image-generative models by leveraging discriminative classifier representations. It introduces discriminative regularization, augmenting the VAE objective with terms that align reconstructions with hidden-layer features of a pretrained classifier, aiming for perceptually sharper, more semantically coherent samples. Experiments across SVHN, CIFAR-10, and CelebA show qualitative improvements in sample realism and identity preservation, though pixel-space likelihoods can worsen. The work highlights a two-way interaction between supervised and unsupervised learning and analyzes artifacts arising from the regularization, laying groundwork for perceptual-centric generative modeling.

Abstract

We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to humans: identity in faces, objects in images, and utterances in speech. We propose to take advantage of this by using the representations from discriminative classifiers to augment the objective function corresponding to a generative model. In particular we enhance the objective function of the variational autoencoder, a popular generative model, with a discriminative regularization term. We show that enhancing the objective function in this way leads to samples that are clearer and have higher visual quality than the samples from the standard variational autoencoders.

Discriminative Regularization for Generative Models

TL;DR

The paper tackles improving image-generative models by leveraging discriminative classifier representations. It introduces discriminative regularization, augmenting the VAE objective with terms that align reconstructions with hidden-layer features of a pretrained classifier, aiming for perceptually sharper, more semantically coherent samples. Experiments across SVHN, CIFAR-10, and CelebA show qualitative improvements in sample realism and identity preservation, though pixel-space likelihoods can worsen. The work highlights a two-way interaction between supervised and unsupervised learning and analyzes artifacts arising from the regularization, laying groundwork for perceptual-centric generative modeling.

Abstract

We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to humans: identity in faces, objects in images, and utterances in speech. We propose to take advantage of this by using the representations from discriminative classifiers to augment the objective function corresponding to a generative model. In particular we enhance the objective function of the variational autoencoder, a popular generative model, with a discriminative regularization term. We show that enhancing the objective function in this way leads to samples that are clearer and have higher visual quality than the samples from the standard variational autoencoders.

Paper Structure

This paper contains 15 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The discriminative regularization model. Layers ${\bm{f}}_1$, ${\bm{f}}_2$, ${\bm{f}}_3$, ${\bm{d}}_1$, ${\bm{d}}_2$ and ${\bm{d}}_3$ represent convolutional layers, whereas layers ${\bm{g}}_3$, ${\bm{g}}_4$ and ${\bm{\mu}}_{\theta}$ represent fractionally strided convolutional layers.
  • Figure 2: CIFAR samples generated from variational autoencoders trained with and without the discriminative regularization. The architecture and the hyperparameters (except those directly related to discriminative regularization) are the same for both models. Our baseline VAE samples are similar in visual fidelity to other results in the literature mansimov2015captions. Discriminative regularization often does a good job of producing coherent objects, but the textures are usually muddled and the samples lack local detail
  • Figure 3: SVHN samples with the standard variational autoencoders (left), real images (center), and samples using discriminative regularization (right). The discriminative regularizer improves the clarity and visual fidelity of the samples. SVHN is the only dataset where we did not observe unnatural patterning when using discriminative regularization.
  • Figure 4: Face samples generated with and without discriminative regularization. On balance, details of the face are better captured and more varied in the samples generated with discriminative regularization.
  • Figure 5: Face reconstructions with (top row) and without (bottom row) discriminative regularization. The face images used for the reconstructions (middle row) are from the held-out validation set and were not seen by the model during training. The architecture and the hyperparameters (except those directly related to discriminative regularization) are the same for both models. Discriminative regularization greatly enhances the model's ability to preserve identity, ethnicity, gender, and expressions. Note that the model does not improve the visual quality of the image background, which likely reflects the fact that the classifier's labels all describe facial attributes. Additional reconstructions can be seen in the appendix.
  • ...and 2 more figures