Table of Contents
Fetching ...

Good Semi-supervised Learning that Requires a Bad GAN

Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov

TL;DR

<3-5 sentence high-level summary>This work analyzes GAN-based semi-supervised learning and reveals that a perfect generator provides no SSL gain, while a carefully designed complement generator can place decision boundaries in low-density regions of the feature space. It introduces a practical framework that (i) increases generator entropy, (ii) generates low-density samples, and (iii) adds a conditional-entropy term to enforce strong true-fake beliefs, collectively approximating a KL divergence minimization to a complement distribution $p^*(x)$. The approach yields substantial empirical gains on MNIST, SVHN, and CIFAR-10 with small discriminators, achieving state-of-the-art single-model results and clarifying the trade-offs between generator quality and SSL performance. These insights offer a principled path to robust SSL with GANs and have practical implications for designing discriminator-guided generators in semi-supervised visual tasks.

Abstract

Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.

Good Semi-supervised Learning that Requires a Bad GAN

TL;DR

<3-5 sentence high-level summary>This work analyzes GAN-based semi-supervised learning and reveals that a perfect generator provides no SSL gain, while a carefully designed complement generator can place decision boundaries in low-density regions of the feature space. It introduces a practical framework that (i) increases generator entropy, (ii) generates low-density samples, and (iii) adds a conditional-entropy term to enforce strong true-fake beliefs, collectively approximating a KL divergence minimization to a complement distribution . The approach yields substantial empirical gains on MNIST, SVHN, and CIFAR-10 with small discriminators, achieving state-of-the-art single-model results and clarifying the trade-offs between generator quality and SSL performance. These insights offer a principled path to robust SSL with GANs and have practical implications for designing discriminator-guided generators in semi-supervised visual tasks.

Abstract

Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time. Theoretically, we show that given the discriminator objective, good semisupervised learning indeed requires a bad generator, and propose the definition of a preferred generator. Empirically, we derive a novel formulation based on our analysis that substantially improves over feature matching GANs, obtaining state-of-the-art results on multiple benchmark datasets.

Paper Structure

This paper contains 27 sections, 5 theorems, 15 equations, 6 figures, 2 tables.

Key Result

Proposition 1

If $p_G = p$, and $D$ has infinite capacity, then for any optimal solution $D = (w, f)$ of the following supervised objective, there exists $D^* = (w^*, f^*)$ such that $D^*$ maximizes Eq. (eq:obj) and that for all $x$, $P_D(y | x, y \leq K) = P_{D^*}(y | x, y \leq K)$.

Figures (6)

  • Figure 1: Labeled and unlabeled data are denoted by cross and point respectively, and different colors indicate classes.
  • Figure 2: Left: Classification decision boundary, where the white line indicates true-fake boundary; Right: True-Fake decision boundary
  • Figure 3: Feature space at convergence
  • Figure 4: Left: Blue points are generated data, and the black shadow indicates unlabeled data. Middle and right can be interpreted as above.
  • Figure 5: Comparing images generated by FM and our model. FM generates collapsed samples, while our model generates diverse "bad" samples.
  • ...and 1 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Lemma 1
  • Corollary 1
  • Proposition 2
  • proof
  • proof
  • Lemma 2
  • proof
  • proof