Table of Contents
Fetching ...

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Denis Blessing, Xiaogang Jia, Johannes Esslinger, Francisco Vargas, Gerhard Neumann

TL;DR

This work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria, and studies existing metrics for quantifying mode collapse and introduces novel metrics for this purpose.

Abstract

Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here.

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

TL;DR

This work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria, and studies existing metrics for quantifying mode collapse and introduces novel metrics for this purpose.

Abstract

Monte Carlo methods, Variational Inference, and their combinations play a pivotal role in sampling from intractable probability distributions. However, current studies lack a unified evaluation framework, relying on disparate performance measures and limited method comparisons across diverse tasks, complicating the assessment of progress and hindering the decision-making of practitioners. In response to these challenges, our work introduces a benchmark that evaluates sampling methods using a standardized task suite and a broad range of performance criteria. Moreover, we study existing metrics for quantifying mode collapse and introduce novel metrics for this purpose. Our findings provide insights into strengths and weaknesses of existing sampling methods, serving as a valuable reference for future developments. The code is publicly available here.
Paper Structure (33 sections, 36 equations, 7 figures, 17 tables)

This paper contains 33 sections, 36 equations, 7 figures, 17 tables.

Figures (7)

  • Figure 1: Illustration of the evidence upper (EUBO) and lower bound (ELBO). The mode-seeking nature of reverse KL results in $\text{ELBO} \ll \log Z$ if the model density $q^{\theta}$ (indicated by the samples $\color{red}{\times}$) averages over the target $\pi$ (indicated by the level plot) ($t_1$) and $\text{ELBO} \approx \log Z$ if $\pi \geq 0$ whenever $q^{\theta} \geq 0$ ($t_2-t_4)$. As a result, the ELBO is not sensitive to mode collapse. In contrast, the mass-covering nature of the forward KL ensures that $\text{EUBO} \gg \log Z$ if $q^{\theta} \approx 0$ whenever $\pi > 0$ ($t_2)$ and $\text{EUBO} \approx \log Z$ if $q^{\theta} \geq 0$ whenever $\pi \geq 0$ ($t_1$). Consequently, the EUBO is well suited to quantify mode collapse.
  • Figure 2: Mean and standard deviation of EMC values for MoG and MoS across varying dimensions $d$.
  • Figure 3: Synthetic target densities. Left: First two dimensions of the funnel density. Middle: Mixture of Student-t distribution with $15$ components (MoS). Right: Mixture of $40$ isotropic Gaussian distributions (MoG).
  • Figure 4: Visualization of samples drawn from different sampling methods for Funnel (top) and MoG (bottom).
  • Figure 5: Visualization of samples drawn from different sampling methods for Digits (top) and Fashion (bottom).
  • ...and 2 more figures