Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

Wenbo He; Zhijian Ou

Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

Wenbo He, Zhijian Ou

TL;DR

This paper introduces Joint-stochastic-approximation autoencoders (JAEs) for semi-supervised learning, addressing two key gaps in deep generative models: effective handling of discrete observations/latents and learning criteria that directly target data likelihood. JAEs couple a generative model pθ(x,h) with an inference model qφ(h|x) and optimize them via stochastic approximation to maximize the data log-likelihood while minimizing the inclusive KL divergence KL(pθ(h|x) || qφ(h|x)), enabling stable training even with discrete variables and various encoder–decoder structures. The semi-supervised extension incorporates labels through pθ(x,y,h) and qφ(y,h|x), using labeled data to guide the discriminator-like term and maintaining efficient posterior sampling via MIS. Empirically, JAEs perform robustly across synthetic tasks (factor analysis, GMMs, sequences) and achieve competitive SSL performance on MNIST and SVHN with discrete latent spaces, demonstrating the first successful application of discrete latent variable models to challenging semi-supervised tasks. This work provides a new optimization paradigm for DGMs in SSL and highlights the practical viability of discrete latent representations for high-performance semi-supervised learning.

Abstract

Our examination of existing deep generative models (DGMs), including VAEs and GANs, reveals two problems. First, their capability in handling discrete observations and latent codes is unsatisfactory, though there are interesting efforts. Second, both VAEs and GANs optimize some criteria that are indirectly related to the data likelihood. To address these problems, we formally present Joint-stochastic-approximation (JSA) autoencoders - a new family of algorithms for building deep directed generative models, with application to semi-supervised learning. The JSA learning algorithm directly maximizes the data log-likelihood and simultaneously minimizes the inclusive KL divergence the between the posteriori and the inference model. We provide theoretical results and conduct a series of experiments to show its superiority such as being robust to structure mismatch between encoder and decoder, consistent handling of both discrete and continuous variables. Particularly we empirically show that JSA autoencoders with discrete latent space achieve comparable performance to other state-of-the-art DGMs with continuous latent space in semi-supervised tasks over the widely adopted datasets - MNIST and SVHN. To the best of our knowledge, this is the first demonstration that discrete latent variable models are successfully applied in the challenging semi-supervised tasks.

Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

TL;DR

Abstract

Joint-stochastic-approximation Autoencoders with Application to Semi-supervised Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (4)