Table of Contents
Fetching ...

Compositional Generative Modeling: A Single Model is Not All You Need

Yilun Du, Leslie Kaelbling

TL;DR

Addresses the scalability bottleneck of monolithic generative models by advocating compositional modeling built from simpler components that model subsets of variables. The approach uses factorization, energy-based representations, and sampling techniques (Langevin dynamics, diffusion models, annealed sampling) to efficiently combine distributions and reprogram models for unseen tasks. The paper provides concrete illustrations across planning, image/video synthesis, robotics, and unsupervised factor discovery, and discusses challenges and directions for automatic structure discovery and robustness. Overall, it presents compositional generative modeling as a practical, scalable alternative to ever-larger monolithic models.

Abstract

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.

Compositional Generative Modeling: A Single Model is Not All You Need

TL;DR

Addresses the scalability bottleneck of monolithic generative models by advocating compositional modeling built from simpler components that model subsets of variables. The approach uses factorization, energy-based representations, and sampling techniques (Langevin dynamics, diffusion models, annealed sampling) to efficiently combine distributions and reprogram models for unseen tasks. The paper provides concrete illustrations across planning, image/video synthesis, robotics, and unsupervised factor discovery, and discusses challenges and directions for automatic structure discovery and robustness. Overall, it presents compositional generative modeling as a practical, scalable alternative to ever-larger monolithic models.

Abstract

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.
Paper Structure (10 sections, 22 equations, 15 figures)

This paper contains 10 sections, 22 equations, 15 figures.

Figures (15)

  • Figure 1: Rising Size and Cost of Models. While much of AI research has focused on constructing increasingly larger monolithic models, training costs are exponentially rising by a factor of 3 every year with current models already costing several hundred million dollars per training run. Data from epoch2023aitrends.
  • Figure 2: Limited Compositionality in Multimodal Models. Existing large multimodal models such as GPT-4V and DALL-E 3 still struggle with simple textual queries, often falling back to biases in data.
  • Figure 3: Generalizing Outside Training Data. Given a narrow slice of training data, we can learn generative models that generalize outside the data through composition. We learn separate generative models to model each axis of the data -- the composition of models can then cover the entire data space.
  • Figure 4: Distribution Composition -- When modeling simple product (top) or mixture (bottom) compositions, learning two compositional models on the factors is more data efficient than learning a single monolithic model on the product distribution. The monolithic model is trained on twice as much data as individual factors.
  • Figure 5: Compositional Trajectory Generation -- By factorizing a trajectory generative model into a set of components, models are able to more accurately simulate dynamics from limited trajectories (a) and train in fewer training iterations (b).
  • ...and 10 more figures