Table of Contents
Fetching ...

Improving Adversarial Energy-Based Model via Diffusion Process

Cong Geng, Tian Han, Peng-Tao Jiang, Hao Zhang, Jinwei Chen, Søren Hauberg, Bo Li

TL;DR

This work addresses the difficulty of training energy-based models (EBMs) with expensive MCMC by embedding adversarial EBMs into a denoising diffusion process, breaking long generation into smaller conditional steps. It introduces a generator-driven variational distribution and a symmetric Jeffrey divergence to stabilize training and better match distributions, along with a gradient-penalty term to stabilize energy optimization. The approach yields substantial gains in sample quality and density estimation over existing adversarial EBMs, performs competitively with diffusion-based models on image generation, and demonstrates useful out-of-distribution detection via energy scores. The framework offers a scalable, MCMC-free pathway to joint generation and density estimation, with practical impact on tasks requiring both high-fidelity samples and tractable likelihoods.

Abstract

Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models~(EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs, but a noticeable gap between adversarial EBMs and other strong generative models still exists. Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps. Besides, we employ a symmetric Jeffrey divergence and introduce a variational posterior distribution for the generator's training to address the main challenges that exist in adversarial EBMs. Our experiments show significant improvement in generation compared to existing adversarial EBMs, while also providing a useful energy function for efficient density estimation.

Improving Adversarial Energy-Based Model via Diffusion Process

TL;DR

This work addresses the difficulty of training energy-based models (EBMs) with expensive MCMC by embedding adversarial EBMs into a denoising diffusion process, breaking long generation into smaller conditional steps. It introduces a generator-driven variational distribution and a symmetric Jeffrey divergence to stabilize training and better match distributions, along with a gradient-penalty term to stabilize energy optimization. The approach yields substantial gains in sample quality and density estimation over existing adversarial EBMs, performs competitively with diffusion-based models on image generation, and demonstrates useful out-of-distribution detection via energy scores. The framework offers a scalable, MCMC-free pathway to joint generation and density estimation, with practical impact on tasks requiring both high-fidelity samples and tractable likelihoods.

Abstract

Generative models have shown strong generation ability while efficient likelihood estimation is less explored. Energy-based models~(EBMs) define a flexible energy function to parameterize unnormalized densities efficiently but are notorious for being difficult to train. Adversarial EBMs introduce a generator to form a minimax training game to avoid expensive MCMC sampling used in traditional EBMs, but a noticeable gap between adversarial EBMs and other strong generative models still exists. Inspired by diffusion-based models, we embedded EBMs into each denoising step to split a long-generated process into several smaller steps. Besides, we employ a symmetric Jeffrey divergence and introduce a variational posterior distribution for the generator's training to address the main challenges that exist in adversarial EBMs. Our experiments show significant improvement in generation compared to existing adversarial EBMs, while also providing a useful energy function for efficient density estimation.
Paper Structure (39 sections, 48 equations, 9 figures, 11 tables)

This paper contains 39 sections, 48 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Density estimation and generation on 25-Gaussians and pinwheel datasets. For each dataset, the first row shows the estimated densities and the second row shows generated samples.
  • Figure 2: Exact samples from the NICE model and generated samples from the generator for WGAN-based methods and our DDAEBM.
  • Figure 3: Randomly generated images with DDAEBM on $32\times 32$ CIFAR-10, $64\times 64$ CelebA and $128\times 128$ LSUN church datasets.
  • Figure 4: Histogram of unnormalized log-likelihood for comparison of real data and fake data or noisy data with the standard deviations being 0.01, 0.1, and 0.5. We provide the histogram comparison on CelebA test images.
  • Figure 5: The FID score vs. the number of training epochs for a different number of time steps.
  • ...and 4 more figures