Compositional Generative Modeling: A Single Model is Not All You Need
Yilun Du, Leslie Kaelbling
TL;DR
Addresses the scalability bottleneck of monolithic generative models by advocating compositional modeling built from simpler components that model subsets of variables. The approach uses factorization, energy-based representations, and sampling techniques (Langevin dynamics, diffusion models, annealed sampling) to efficiently combine distributions and reprogram models for unseen tasks. The paper provides concrete illustrations across planning, image/video synthesis, robotics, and unsupervised factor discovery, and discusses challenges and directions for automatic structure discovery and robustness. Overall, it presents compositional generative modeling as a practical, scalable alternative to ever-larger monolithic models.
Abstract
Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.
