Mixed Variational Flows for Discrete Variables
Gian Carlo Diluvi, Benjamin Bloem-Reddy, Trevor Campbell
TL;DR
This work tackles the challenge of variational inference for discrete distributions by eschewing continuous embeddings and introducing MAD Mix, a measure-preserving, discrete variational family built from a MAD map that augments the target with uniform variables. MAD Mix enables i.i.d. sampling and exact density evaluation while preserving the target distribution, and extends to joint discrete-continuous models through a combined map with discretized Hamiltonian dynamics. The authors provide theoretical guarantees (invertibility, density of pushforward, measure-preservation) and demonstrate through experiments that MAD Mix yields high-fidelity discrete approximations with substantially faster training and more stable behavior than continuous-embedding flows, as well as direct comparability to Gibbs sampling in sampling quality. The approach offers a practical, scalable alternative for discrete and mixed-variable Bayesian models with reliable density-based evaluation via ELBO.
Abstract
Variational flows allow practitioners to learn complex continuous distributions, but approximating discrete distributions remains a challenge. Current methodologies typically embed the discrete target in a continuous space - usually via continuous relaxation or dequantization - and then apply a continuous flow. These approaches involve a surrogate target that may not capture the original discrete target, might have biased or unstable gradients, and can create a difficult optimization problem. In this work, we develop a variational flow family for discrete distributions without any continuous embedding. First, we develop a measure-preserving and discrete (MAD) invertible map that leaves the discrete target invariant, and then create a mixed variational flow (MAD Mix) based on that map. Our family provides access to i.i.d. sampling and density evaluation with virtually no tuning effort. We also develop an extension to MAD Mix that handles joint discrete and continuous models. Our experiments suggest that MAD Mix produces more reliable approximations than continuous-embedding flows while being significantly faster to train.
