Symmetric Equilibrium Learning of VAEs

Boris Flach; Dmitrij Schlesinger; Alexander Shekhovtsov

Symmetric Equilibrium Learning of VAEs

Boris Flach, Dmitrij Schlesinger, Alexander Shekhovtsov

TL;DR

This work introduces symmetric equilibrium learning for VAEs by framing encoder and decoder as two players in a Nash game, addressing ELBO's inherent asymmetry and its limitations with complex priors, semi-supervised data, and structured latent spaces. The method defines two utilities, $L_p$ and $L_q$, enabling learning via simple gradient updates without reparametrisation, and proves a unique, stable equilibrium for lifted exponential-family models. It extends naturally to semi-supervised, unsupervised, and hierarchical VAEs, including implicit priors, and demonstrates practical applicability through experiments on MNIST and CelebA that match ELBO performance while improving encoder–decoder consistency and enabling tasks outside ELBO's scope. The approach broadens VAE applicability to more complex data, latent structures, and sampling-based learning scenarios, with potential impact on downstream tasks requiring robust bidirectional inference.

Abstract

We view variational autoencoders (VAE) as decoder-encoder pairs, which map distributions in the data space to distributions in the latent space and vice versa. The standard learning approach for VAEs is the maximisation of the evidence lower bound (ELBO). It is asymmetric in that it aims at learning a latent variable model while using the encoder as an auxiliary means only. Moreover, it requires a closed form a-priori latent distribution. This limits its applicability in more complex scenarios, such as general semi-supervised learning and employing complex generative models as priors. We propose a Nash equilibrium learning approach, which is symmetric with respect to the encoder and decoder and allows learning VAEs in situations where both the data and the latent distributions are accessible only by sampling. The flexibility and simplicity of this approach allows its application to a wide range of learning scenarios and downstream tasks.

Symmetric Equilibrium Learning of VAEs

TL;DR

and

, enabling learning via simple gradient updates without reparametrisation, and proves a unique, stable equilibrium for lifted exponential-family models. It extends naturally to semi-supervised, unsupervised, and hierarchical VAEs, including implicit priors, and demonstrates practical applicability through experiments on MNIST and CelebA that match ELBO performance while improving encoder–decoder consistency and enabling tasks outside ELBO's scope. The approach broadens VAE applicability to more complex data, latent structures, and sampling-based learning scenarios, with potential impact on downstream tasks requiring robust bidirectional inference.

Abstract

Paper Structure (26 sections, 4 theorems, 46 equations, 8 figures, 1 table)

This paper contains 26 sections, 4 theorems, 46 equations, 8 figures, 1 table.

INTRODUCTION
PROBLEM FORMULATION
SYMMETRIC EQUILIBRIUM LEARNING
Uniqueness
Consistency
ADVANCED MODELS AND LEARNING SETUPS
Semi-Supervised Learning with Mixed Data
Unsupervised Learning
Hierarchical VAEs
RELATED WORK
Wake-Sleep
Implicit Prior
Symmetric Learning
Unsupervised and Semi-Supervised VAEs
EXPERIMENTS
...and 11 more sections

Key Result

Theorem 1

The two-player game with utility functions and strategies given by exponential family distributions eq:expf-lifted has a unique, asymptotically stable equilibrium.

Figures (8)

Figure 1: Ladder VAE (MNIST): FID scores and images generated from random latent codes and from limiting distributions of models learned by maximising ELBO and by symmetric equilibrium learning (images are shown by probabilities for better visibility).
Figure 2: MNIST: tSNE embeddings for the VAE with class labels. Points are coloured by digit classes. See text for explanation.
Figure 3: Given images and segmentations $(x_i,s_i)$ from the training set ($x_i$ are shown in the leftmost column), latent codes $z_{2i}$ are sampled from $q_{\varphi_2}(z_2 \,\vert\, x_i,s_i)$. Given segmentations $s_j$ shown in the top row, images $x_{i,j}$ are sampled from $p_{\theta_2}(x \,\vert\, s_j,z_{2i})$. Images are shown by mean values of the respective Gaussians for better visibility.
Figure 4: First two rows: training data $(x, s)$. Third and fourth rows: reconstructed images, and segmentations sampled from $p_\theta(x \,\vert\, s, z_2)$ and from $p_\theta(s\,\vert\, x, z)$ with $z\sim q_\varphi(z\,\vert\, x,s)$. Last two rows: sampling image--segmentation pairs from the full limiting distribution.
Figure 6: MNIST network architecture.
...and 3 more figures

Theorems & Definitions (6)

Theorem 1
Proposition 1
Theorem 1
proof
Proposition 1
proof

Symmetric Equilibrium Learning of VAEs

TL;DR

Abstract

Symmetric Equilibrium Learning of VAEs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)