Table of Contents
Fetching ...

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

Runqian Wang, Yilun Du

TL;DR

Equilibrium Matching (EqM) introduces a time-invariant equilibrium gradient over an implicit energy landscape as an alternative to non-equilibrium diffusion/flow dynamics. It trains a target gradient via corrupted interpolation x_γ and a decaying function c(γ), enabling gradient-descent sampling with adaptive step sizes and optimizers, and supports explicit energy variants. The authors prove that EqM learns the data manifold and that gradient-based sampling converges, while empirically achieving state-of-the-art-like results on ImageNet 256×256 (FID 1.90) and displaying strong scalability and flexible inference-time capabilities, including partial denoising, OOD detection, and compositional generation. By bridging flow-based and energy-based modeling, EqM offers a simple, optimization-driven inference paradigm with broad potential for scalable high-fidelity image generation.

Abstract

We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256$\times$256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.

Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models

TL;DR

Equilibrium Matching (EqM) introduces a time-invariant equilibrium gradient over an implicit energy landscape as an alternative to non-equilibrium diffusion/flow dynamics. It trains a target gradient via corrupted interpolation x_γ and a decaying function c(γ), enabling gradient-descent sampling with adaptive step sizes and optimizers, and supports explicit energy variants. The authors prove that EqM learns the data manifold and that gradient-based sampling converges, while empirically achieving state-of-the-art-like results on ImageNet 256×256 (FID 1.90) and displaying strong scalability and flexible inference-time capabilities, including partial denoising, OOD detection, and compositional generation. By bridging flow-based and energy-based modeling, EqM offers a simple, optimization-driven inference paradigm with broad potential for scalable high-fidelity image generation.

Abstract

We introduce Equilibrium Matching (EqM), a generative modeling framework built from an equilibrium dynamics perspective. EqM discards the non-equilibrium, time-conditional dynamics in traditional diffusion and flow-based generative models and instead learns the equilibrium gradient of an implicit energy landscape. Through this approach, we can adopt an optimization-based sampling process at inference time, where samples are obtained by gradient descent on the learned landscape with adjustable step sizes, adaptive optimizers, and adaptive compute. EqM surpasses the generation performance of diffusion/flow models empirically, achieving an FID of 1.90 on ImageNet 256256. EqM is also theoretically justified to learn and sample from the data manifold. Beyond generation, EqM is a flexible framework that naturally handles tasks including partially noised image denoising, OOD detection, and image composition. By replacing time-conditional velocities with a unified equilibrium landscape, EqM offers a tighter bridge between flow and energy-based models and a simple route to optimization-driven inference.

Paper Structure

This paper contains 28 sections, 31 equations, 13 figures, 10 tables, 2 algorithms.

Figures (13)

  • Figure 1: Conceptual 2D Visualization. We compare the conceptual 2D dynamics of Equilibrium Matching and Flow Matching under 2 ground truths (marked by stars). Left: Flow Matching learns non-equilibrium velocity that varies over time. Right: Equilibrium Matching learns an equilibrium gradient that is time-invariant.
  • Figure 1: Class-Conditional ImageNet 256$\times$256 Generation. EqM-XL/2 achieves a 1.90 FID, surpassing other tested methods.
  • Figure 2: Curated Samples. We present curated samples generated by our EqM-XL/2 model.
  • Figure 3: Different Target Gradient Fields. Several settings exceed the noise unconditional Flow Matching baseline in performance. Best performance achieved with the truncated decay $c_\text{trunc}$ and hyperparameter $a=0.8$.
  • Figure 4: Sampling Process Visualization. We present intermediate samples from XL/2 models using the same 0.004 step size. EqM (bottom) produces realistic images much earlier than FM (top).
  • ...and 8 more figures

Theorems & Definitions (3)

  • proof : Derivation of Statement \ref{['st1']}
  • proof : Derivation of Statement \ref{['st11']}
  • proof : Derivation of Statement \ref{['st2']}