Table of Contents
Fetching ...

Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder

Seunghwan Kim, Seungkyu Lee

TL;DR

The paper tackles the blurry output problem in variational autoencoders by disentangling the decoder variance $σ^2_x$ from the beta parameter $β$ of beta-VAE, showing that treating them as a single integrated parameter leads to indeterminate likelihood and suboptimal optimization. It introduces Beta-Sigma VAE (BS-VAE), which uses a per-sample optimal decoder variance $σ^{2^*}_x(z_i) = (x_i - μ_x(z_i))^2$ and reintroduces $β$ to separately control reconstruction noise and latent regularization, yielding a controllable rate-distortion curve and improved proxy metrics. Experimental results on CelebA and MNIST demonstrate that optimal $σ^2_x$ and optimal $β$ are distinct and that BS-VAE consistently outperforms constant-variance β-VAEs across the $β$ spectrum, including better log-likelihood at $β=1$ and best FID at $β=10$. The approach is architecture-agnostic and provides a framework for predictable analysis of VAE performance, suggesting a path toward sharper generative outputs without sacrificing interpretability.

Abstract

Variational autoencoder (VAE) is an established generative model but is notorious for its blurriness. In this work, we investigate the blurry output problem of VAE and resolve it, exploiting the variance of Gaussian decoder and $β$ of beta-VAE. Specifically, we reveal that the indistinguishability of decoder variance and $β$ hinders appropriate analysis of the model by random likelihood value, and limits performance improvement by omitting the gain from $β$. To address the problem, we propose Beta-Sigma VAE (BS-VAE) that explicitly separates $β$ and decoder variance $σ^2_x$ in the model. Our method demonstrates not only superior performance in natural image synthesis but also controllable parameters and predictable analysis compared to conventional VAE. In our experimental evaluation, we employ the analysis of rate-distortion curve and proxy metrics on computer vision datasets. The code is available on https://github.com/overnap/BS-VAE

Beta-Sigma VAE: Separating beta and decoder variance in Gaussian variational autoencoder

TL;DR

The paper tackles the blurry output problem in variational autoencoders by disentangling the decoder variance from the beta parameter of beta-VAE, showing that treating them as a single integrated parameter leads to indeterminate likelihood and suboptimal optimization. It introduces Beta-Sigma VAE (BS-VAE), which uses a per-sample optimal decoder variance and reintroduces to separately control reconstruction noise and latent regularization, yielding a controllable rate-distortion curve and improved proxy metrics. Experimental results on CelebA and MNIST demonstrate that optimal and optimal are distinct and that BS-VAE consistently outperforms constant-variance β-VAEs across the spectrum, including better log-likelihood at and best FID at . The approach is architecture-agnostic and provides a framework for predictable analysis of VAE performance, suggesting a path toward sharper generative outputs without sacrificing interpretability.

Abstract

Variational autoencoder (VAE) is an established generative model but is notorious for its blurriness. In this work, we investigate the blurry output problem of VAE and resolve it, exploiting the variance of Gaussian decoder and of beta-VAE. Specifically, we reveal that the indistinguishability of decoder variance and hinders appropriate analysis of the model by random likelihood value, and limits performance improvement by omitting the gain from . To address the problem, we propose Beta-Sigma VAE (BS-VAE) that explicitly separates and decoder variance in the model. Our method demonstrates not only superior performance in natural image synthesis but also controllable parameters and predictable analysis compared to conventional VAE. In our experimental evaluation, we employ the analysis of rate-distortion curve and proxy metrics on computer vision datasets. The code is available on https://github.com/overnap/BS-VAE
Paper Structure (11 sections, 9 equations, 4 figures, 1 table)

This paper contains 11 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: The toy example of poor reconstruction and poor generation on CelebA dataset liu2015celeba. Model A displays a blurry reconstruction, but the quality of reconstruction and generation is consistent. Model B shows a relatively clear reconstruction, but the generation is blurry and unrealistic. Their setup is identical to the one in the experiment, and the samples are selected without any intention, i.e., no cherry picking.
  • Figure 2: The conceptual figure of optimizing $\sigma^2_x$ and $\beta$. (A) The dashed line indicates a constant $\sigma^2_x$ beta-VAE with same weights. Since the single integrated parameter $\beta \cdot C \equiv \sigma^2_x$ is set, researchers can arbitrarily choose $\beta$ and $C$ values for a $\sigma^2_x$. This harms VAE research by the inconsistency. (B) (1) A typical VAE cannot control each parameter. $\beta$ has almost no function beyond tuning $\sigma^2_x$ here. (2) Our method can tune the $\beta$ value while maintaining a reasonably low $\sigma^2_x$ value for the best likelihood. (3) The existing model with learnable decoder variance cannot adjust $\beta$, so it only represents a single point.
  • Figure 3: The rate-distortion curve plotting BS-VAEs and conventional beta-VAEs with constant $\sigma^2_x$. The constant variance can be interpreted in various ways, so the optimal $\sigma^2_x$ that leads distortion to the lower bound and two common $\sigma^2_x$s are indicated. BS-VAEs outperform the conventional models by any interpretation of $\sigma^2_x$.
  • Figure 4: Reconstructed or generated samples of common beta-VAEs with constant decoder variance and our BS-VAEs. Our models maintain good reconstruction quality within tested $\beta$s. The samples are selected without any intention, i.e., no cherry picking.