Table of Contents
Fetching ...

Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning

Yunfu Song, Zhijian Ou

TL;DR

The paper tackles SSL with deep generative models by addressing two core issues: mode coverage/missing and the conflict between classification and generation in directed models. It introduces Joint-stochastic-approximation random fields (JRFs), deep undirected energy-based models trained via Joint-stochastic-approximation (JSA) that pair a target RF $p_\theta(x)$ with an auxiliary directed generator $q_\beta(h|x)$ and use Langevin-style sampling to draw samples. Through SA-based learning, JRFs jointly optimize the RF and the auxiliary model, and are extended to SSL by modeling $(x,y)$ with a joint energy and incorporating supervised terms and regularizers $R_c$ and $R_s$. Empirically, JRFs achieve competitive SSL classification on MNIST, SVHN, and CIFAR-10 while also delivering high-quality generation, with advantages over GAN-based approaches in avoiding mode collapse and over EBGMs in distribution matching. The work demonstrates, for the first time, that deep random-field models can effectively support SSL, suggesting a promising direction for undirected deep generative modeling in practical learning tasks.

Abstract

Our examination of deep generative models (DGMs) developed for semi-supervised learning (SSL), mainly GANs and VAEs, reveals two problems. First, mode missing and mode covering phenomenons are observed in genertion with GANs and VAEs. Second, there exists an awkward conflict between good classification and good generation in SSL by employing directed generative models. To address these problems, we formally present joint-stochastic-approximation random fields (JRFs) -- a new family of algorithms for building deep undirected generative models, with application to SSL. It is found through synthetic experiments that JRFs work well in balancing mode covering and mode missing, and match the empirical data distribution well. Empirically, JRFs achieve good classification results comparable to the state-of-art methods on widely adopted datasets -- MNIST, SVHN, and CIFAR-10 in SSL, and simultaneously perform good generation.

Joint-stochastic-approximation Random Fields with Application to Semi-supervised Learning

TL;DR

The paper tackles SSL with deep generative models by addressing two core issues: mode coverage/missing and the conflict between classification and generation in directed models. It introduces Joint-stochastic-approximation random fields (JRFs), deep undirected energy-based models trained via Joint-stochastic-approximation (JSA) that pair a target RF with an auxiliary directed generator and use Langevin-style sampling to draw samples. Through SA-based learning, JRFs jointly optimize the RF and the auxiliary model, and are extended to SSL by modeling with a joint energy and incorporating supervised terms and regularizers and . Empirically, JRFs achieve competitive SSL classification on MNIST, SVHN, and CIFAR-10 while also delivering high-quality generation, with advantages over GAN-based approaches in avoiding mode collapse and over EBGMs in distribution matching. The work demonstrates, for the first time, that deep random-field models can effectively support SSL, suggesting a promising direction for undirected deep generative modeling in practical learning tasks.

Abstract

Our examination of deep generative models (DGMs) developed for semi-supervised learning (SSL), mainly GANs and VAEs, reveals two problems. First, mode missing and mode covering phenomenons are observed in genertion with GANs and VAEs. Second, there exists an awkward conflict between good classification and good generation in SSL by employing directed generative models. To address these problems, we formally present joint-stochastic-approximation random fields (JRFs) -- a new family of algorithms for building deep undirected generative models, with application to SSL. It is found through synthetic experiments that JRFs work well in balancing mode covering and mode missing, and match the empirical data distribution well. Empirically, JRFs achieve good classification results comparable to the state-of-art methods on widely adopted datasets -- MNIST, SVHN, and CIFAR-10 in SSL, and simultaneously perform good generation.

Paper Structure

This paper contains 18 sections, 1 theorem, 10 equations, 5 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

If Eq.(eq:jrf_unsup_gradient) is solvable, then we can apply the SA algorithm to find its root.

Figures (5)

  • Figure 1: Figures of the first toy experiment. (a) data distribution of training set. (b)stochastic generation of GAN without feature matching. (c)stochastic generation of GAN with feature matching. (d)stochastic generation of EBGMs. (e)stochastic generation of JRFs. (f)samples of JRFs from a revision process on stochastic generation. (g)the learned energy density of the random field of EBGMs. (h)the learned energy density of the random field of JRFs. The red dots represent the modes of training distribution. Each generation contains 1,000 samples. For (g)(h), white represents low energy and black for high energy. The energy density of JRFs matches $p(x)$ better, while of EBGMs shows trailers around modes.
  • Figure 2: Comparing images generated by Improved-GAN and semi-JRFs on SVHN and CIFAR-10. Improved-GAN generates collapsed and strange samples, while semi-JRFs preform diverse and realistic.
  • Figure 3: Figures of the second toy experiment. (a) data distribution of training set. (b)the learned $p_\theta(x)$ of the random field of semi-JRFs. (c)the learned $p_\theta(x|y=1)$ of the random field. (d)the learned $p_\theta(x|y=2)$ of the random field. Each class has 4 labeled data, blue dots for class 1 and red for class 2. For energy density figures, white represents low energy and black for high energy.
  • Figure 4: Interpolation of semi-JRFs on MNIST. The leftmost and rightmost columns are from stochastic generation. The other columns show images generated by interpolating between them in latent variable space.
  • Figure 5: A pathway of conditional generation for semi-JRFs with MNIST. The generating process is described in text. The contrast loss of images is from the revision process that the pixel value of background becomes more than 0. Each line contains 10 different images conditioned on the same label.

Theorems & Definitions (2)

  • Proposition 1
  • proof