Table of Contents
Fetching ...

Bidirectional Variational Autoencoders

Bart Kosko, Olaoluwa Adigun

TL;DR

Bidirectional Variational Autoencoders (BVAE) replace the conventional encoder–decoder pair with a single network that encodes in the forward direction and decodes in the backward direction via shared weights, trained with Bidirectional Backpropagation (B-BP). This yields the BELBO objective, where $\mathcal{L}_{BELBO}(x,\theta)=\mathbb{E}_{z|x,\theta}[\ln p(x|z,\theta)] - D_{KL}(q_f(z|x,\theta)\| p(z|\theta))$, providing a bound on $\ln p(x|\theta)$ and enabling joint optimization of directional likelihoods without a separate encoder network. Empirical results on MNIST, Fashion-MNIST, CIFAR-10, and CelebA-64 show that BVAE matches or modestly exceeds the performance of unidirectional VAEs while reducing parameter counts by about 50%, demonstrating that bidirectional inference with a single network can maintain generative quality and improve efficiency for image synthesis and compression tasks.

Abstract

We present the new bidirectional variational autoencoder (BVAE) network architecture. The BVAE uses a single neural network both to encode and decode instead of an encoder-decoder network pair. The network encodes in the forward direction and decodes in the backward direction through the same synaptic web. Simulations compared BVAEs and ordinary VAEs on the four image tasks of image reconstruction, classification, interpolation, and generation. The image datasets included MNIST handwritten digits, Fashion-MNIST, CIFAR-10, and CelebA-64 face images. The bidirectional structure of BVAEs cut the parameter count by almost 50% and still slightly outperformed the unidirectional VAEs.

Bidirectional Variational Autoencoders

TL;DR

Bidirectional Variational Autoencoders (BVAE) replace the conventional encoder–decoder pair with a single network that encodes in the forward direction and decodes in the backward direction via shared weights, trained with Bidirectional Backpropagation (B-BP). This yields the BELBO objective, where , providing a bound on and enabling joint optimization of directional likelihoods without a separate encoder network. Empirical results on MNIST, Fashion-MNIST, CIFAR-10, and CelebA-64 show that BVAE matches or modestly exceeds the performance of unidirectional VAEs while reducing parameter counts by about 50%, demonstrating that bidirectional inference with a single network can maintain generative quality and improve efficiency for image synthesis and compression tasks.

Abstract

We present the new bidirectional variational autoencoder (BVAE) network architecture. The BVAE uses a single neural network both to encode and decode instead of an encoder-decoder network pair. The network encodes in the forward direction and decodes in the backward direction through the same synaptic web. Simulations compared BVAEs and ordinary VAEs on the four image tasks of image reconstruction, classification, interpolation, and generation. The image datasets included MNIST handwritten digits, Fashion-MNIST, CIFAR-10, and CelebA-64 face images. The bidirectional structure of BVAEs cut the parameter count by almost 50% and still slightly outperformed the unidirectional VAEs.

Paper Structure

This paper contains 7 sections, 24 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Bidirectional vs. unidirectional variational autoencoders: Unidirectional VAEs use the forward passes of two separate networks for encoding and decoding. Bidirectional VAEs encode on the forward pass and decode on the backward pass with the same synaptic weight matrices in both directions. This cuts the number of tunable parameters roughly in half. (a) The decoder network with parameter $\theta$ approximates $p(x|z, \theta)$ and the encoder network with parameter $\phi$ approximates $q(z|x, \theta)$. (b) Bidirectional VAEs use the forward pass of a network with parameter $\theta$ to approximate $q(z|x,\theta)$ and the backward pass of the network to approximate $p(x|z,\theta)$.
  • Figure 2: BELBO training of a bidirectional variational autoencoder with the bidirectional backpropagation algorithm. BELBO maximization uses a single network for encoding and decoding. The forward pass with likelihood $q_f(z|x, \theta)$ encodes the latent features. The backward pass with likelihood $p_b(x|z, \theta)$ decodes the latent features over the same web of synapses.
  • Figure 3: Bidirectional VAE with residual network architecture: This cuts the tunable parameters roughly in half compared with unidirectional VAEs. (a) is the bidirectional convolutional layer. Convolution runs in the forward pass and convolution transpose runs in reverse with the same set of convolution masks. (b) is the architecture of a bidirectional residual block with bidirectional skip connections.
  • Figure 4: $t$-SNE embedding for the MNIST handwritten digit dataset: Latent space dimension is $128$. (a) A simple linear classifier that trained on the unidirectional VAE-compressed features achieved a $95.2\%$ accuracy. (b) The simple classifier achieved $97.32\%$ accuracy when it trained on the BVAE-compressed features.