Table of Contents
Fetching ...

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Michal Balcerak, Suprosana Shit, Chinmay Prabhakar, Sebastian Kaltenbach, Michael S. Albergo, Yilun Du, Bjoern Menze

Abstract

Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.

Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation

Abstract

Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs.
Paper Structure (56 sections, 38 equations, 8 figures, 4 tables, 2 algorithms)

This paper contains 56 sections, 38 equations, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: gem Sampling Overview. Two perspectives on gem sampling: a probability-distribution view (top) of the two-phase mcmc process, and a samples view (bottom) showing molecular trajectories from MOSES. Sampling alternates between a transport phase, where gradient-informed, greedy proposals rapidly move samples toward regions of high probability, and a mixing phase employing mh acceptance to ensure correct stationary distribution and efficient mixing between modes. Color key: transport (orange), mh (blue).
  • Figure 2: Proposal Scoring. Local graph edit proposals with scoring given by $q_\mathrm{mixing}(x\to y)$. The size of dots encodes proposal probabilities. Alignment with the gradient direction increases the probability, while distance penalties discourage larger edits. If the candidate is the origin (stay), we resample to promote exploration. (with equal node/edge weights $\lambda^L_{\mathcal{V}} = \lambda^L_{\mathcal{E}}$ for illustrative purposes)
  • Figure 3: Energy and Sampling Trajectories. Top: energy evolution for noise- vs. data-initialized chains, with mean $\pm$ 1 std bands and reference levels at $\mathbb{E}_{x\sim \pi_\mathrm{data}}[V_\theta(x)]$ and $\mathbb{E}_{x\sim \pi_0}[V_\theta(x)]$. Noise-initialized chains use greedy proposals to reach the data distribution; data-initialized chains use temperature annealing (low initial $\beta_{\mathrm{mh}}$, gradually increasing) to recover novelty. Bottom: validity and novelty trajectories for data-initialized (teal) and noise-initialized (red) chains over inference steps. Uniqueness $\approx$100%.
  • Figure 4: vun and fcd vs Steps.vun (higher is better) and fcd (lower is better) versus inference steps on MOSES for noise initialization (uniform) with greedy warmup proposal, data initialization with annealed proposal, and DeFoG (marginal).
  • Figure 5: gem Energy-Weighted vs Cost-Only Geodesics. Molecule trajectories along continuous geodesic paths for MOSES. Top row: gem energy-weighted geodesic; bottom row: cost-only geodesic. Boxes indicate samples along the continuous path.
  • ...and 3 more figures