Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Michal Balcerak; Suprosana Shit; Chinmay Prabhakar; Sebastian Kaltenbach; Michael S. Albergo; Yilun Du; Bjoern Menze

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Michal Balcerak, Suprosana Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze

Abstract

Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Abstract

Paper Structure (56 sections, 38 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 56 sections, 38 equations, 8 figures, 4 tables, 2 algorithms.

Introduction
Contributions.
Preliminaries: Energy Matching and JKO
Sampling (two regimes).
Graph Energy Matching
Graph representation.
Local and Permutation-Invariant Cost.
Learnable Components.
Discrete Transport-Aligned Proposal ($\epsilon\to 0$)
Discrete Mixing Proposal ($\epsilon>0$)
Initializations and Proposal Schedules
Determining the Regime.
Sample Initialization.
Training objectives
Noise and Data Interpolation via Minibatch coupling.
...and 41 more sections

Figures (8)

Figure 1: gem Sampling Overview. Two perspectives on gem sampling: a probability-distribution view (top) of the two-phase mcmc process, and a samples view (bottom) showing molecular trajectories from MOSES. Sampling alternates between a transport phase, where gradient-informed, greedy proposals rapidly move samples toward regions of high probability, and a mixing phase employing mh acceptance to ensure correct stationary distribution and efficient mixing between modes. Color key: transport (orange), mh (blue).
Figure 2: Proposal Scoring. Local graph edit proposals with scoring given by $q_\mathrm{mixing}(x\to y)$. The size of dots encodes proposal probabilities. Alignment with the gradient direction increases the probability, while distance penalties discourage larger edits. If the candidate is the origin (stay), we resample to promote exploration. (with equal node/edge weights $\lambda^L_{\mathcal{V}} = \lambda^L_{\mathcal{E}}$ for illustrative purposes)
Figure 3: Energy and Sampling Trajectories. Top: energy evolution for noise- vs. data-initialized chains, with mean $\pm$ 1 std bands and reference levels at $\mathbb{E}_{x\sim \pi_\mathrm{data}}[V_\theta(x)]$ and $\mathbb{E}_{x\sim \pi_0}[V_\theta(x)]$. Noise-initialized chains use greedy proposals to reach the data distribution; data-initialized chains use temperature annealing (low initial $\beta_{\mathrm{mh}}$, gradually increasing) to recover novelty. Bottom: validity and novelty trajectories for data-initialized (teal) and noise-initialized (red) chains over inference steps. Uniqueness $\approx$100%.
Figure 4: vun and fcd vs Steps.vun (higher is better) and fcd (lower is better) versus inference steps on MOSES for noise initialization (uniform) with greedy warmup proposal, data initialization with annealed proposal, and DeFoG (marginal).
Figure 5: gem Energy-Weighted vs Cost-Only Geodesics. Molecule trajectories along continuous geodesic paths for MOSES. Top row: gem energy-weighted geodesic; bottom row: cost-only geodesic. Boxes indicate samples along the continuous path.
...and 3 more figures

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Abstract

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Authors

Abstract

Table of Contents

Figures (8)