Generative Marginalization Models
Sulin Liu, Peter J. Ramadge, Ryan P. Adams
TL;DR
Generative Marginalization Models (MaMs) tackle efficient marginal inference for high-dimensional discrete data by explicitly modeling all induced marginals $p(oldx_\mathcal{S})$ under a marginalization self-consistency constraint. By using a dual-network setup that learns marginals $p_\theta(\boldx)$ and conditionals $p_\phi(\boldx|\cdot)$, and an augmented input representation with a missing-value symbol, MaMs enable estimating any marginal with a single forward pass, while supporting scalable MLE and EB training. The approach yields significant speedups in marginal evaluation and scales to any-order generation in EB contexts, outperforming ARMs and AO-ARMs on a range of discrete tasks including images, text, molecules, and physical systems. These results highlight MaMs' potential for flexible, domain-guided marginal queries, outlier detection, and design tasks in real-world discrete-data problems.
Abstract
We introduce marginalization models (MAMs), a new family of generative models for high-dimensional discrete data. They offer scalable and flexible generative modeling by explicitly modeling all induced marginal distributions. Marginalization models enable fast approximation of arbitrary marginal probabilities with a single forward pass of the neural network, which overcomes a major limitation of arbitrary marginal inference models, such as any-order autoregressive models. MAMs also address the scalability bottleneck encountered in training any-order generative models for high-dimensional problems under the context of energy-based training, where the goal is to match the learned distribution to a given desired probability (specified by an unnormalized log-probability function such as energy or reward function). We propose scalable methods for learning the marginals, grounded in the concept of "marginalization self-consistency". We demonstrate the effectiveness of the proposed model on a variety of discrete data distributions, including images, text, physical systems, and molecules, for maximum likelihood and energy-based training settings. MAMs achieve orders of magnitude speedup in evaluating the marginal probabilities on both settings. For energy-based training tasks, MAMs enable any-order generative modeling of high-dimensional problems beyond the scale of previous methods. Code is available at https://github.com/PrincetonLIPS/MaM.
