Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin, Suprosanna Shit, Lea Bogensperger, Sebastian Kaltenbach, Petros Koumoutsakos, Bjoern Menze
TL;DR
Energy Matching addresses the limitations of flow-based and diffusion models in integrating priors and partial observations by unifying transport dynamics with an energy-based likelihood via a time-independent scalar potential $V_\theta(x)$. Grounded in the Jordan–Kinderlehrer–Otto framework, it uses a two-regime training procedure that transports samples from noise to the data manifold with an OT-like flow, then concentrates probability mass around data via a Boltzmann equilibrium $\rho_{eq}(x) \propto \exp(-V_\theta(x)/\varepsilon_{\max})$. The approach yields a single scalar energy whose gradient drives efficient generation and serves as a flexible prior for inverse problems, with additional interaction energies enabling controlled diversity; it reports state-of-the-art fidelity on CIFAR-10 and ImageNet relative to prior EBMs while avoiding auxiliary generators. Moreover, Energy Matching provides direct access to the data likelihood structure and enables LID estimation through the Hessian of $V_\theta$, offering insights with fewer approximations than diffusion methods. Overall, the framework broadens the practicality and adoption of EBMs by delivering simulation-free transport, explicit likelihood modeling, and versatile priors for generative modeling across diverse domains.
Abstract
Current state-of-the-art generative models map noise to data distributions by matching flows or scores. A key limitation of these models is their inability to readily integrate available partial observations and additional priors. In contrast, energy-based models (EBMs) address this by incorporating corresponding scalar energy terms. Here, we propose Energy Matching, a framework that endows flow-based approaches with the flexibility of EBMs. Far from the data manifold, samples move from noise to data along irrotational, optimal transport paths. As they approach the data manifold, an entropic energy term guides the system into a Boltzmann equilibrium distribution, explicitly capturing the underlying likelihood structure of the data. We parameterize these dynamics with a single time-independent scalar field, which serves as both a powerful generator and a flexible prior for effective regularization of inverse problems. The present method substantially outperforms existing EBMs on CIFAR-10 and ImageNet generation in terms of fidelity, while retaining simulation-free training of transport-based approaches away from the data manifold. Furthermore, we leverage the flexibility of the method to introduce an interaction energy that supports the exploration of diverse modes, which we demonstrate in a controlled protein generation setting. This approach learns a scalar potential energy, without time conditioning, auxiliary generators, or additional networks, marking a significant departure from recent EBM methods. We believe this simplified yet rigorous formulation significantly advances EBMs capabilities and paves the way for their wider adoption in generative modeling in diverse domains.
