Table of Contents
Fetching ...

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Yaxuan Zhu, Jianwen Xie, Yingnian Wu, Ruiqi Gao

TL;DR

This work proposes cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM.

Abstract

Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.

Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

TL;DR

This work proposes cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM.

Abstract

Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming, and there exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versions of a dataset, paired with an initializer model for each EBM. At each noise level, the two models are jointly estimated within a cooperative training framework: samples from the initializer serve as starting points that are refined by a few MCMC sampling steps from the EBM. The EBM is then optimized by maximizing recovery likelihood, while the initializer model is optimized by learning from the difference between the refined samples and the initial samples. In addition, we made several practical designs for EBM training to further improve the sample quality. Combining these advances, our approach significantly boost the generation performance compared to existing EBM methods on CIFAR-10 and ImageNet datasets. We also demonstrate the effectiveness of our models for several downstream tasks, including classifier-free guided generation, compositional generation, image inpainting and out-of-distribution detection.
Paper Structure (33 sections, 21 equations, 15 figures, 11 tables, 2 algorithms)

This paper contains 33 sections, 21 equations, 15 figures, 11 tables, 2 algorithms.

Figures (15)

  • Figure 1: Unconditional generated examples on CIFAR-10 and ImageNet ($32 \times 32$) datasets.
  • Figure 2: Conditional generation on ImageNet ($32 \times 32$) dataset with a classifier-free guidance. (a) Random image samples generated with different guided weights $w=0.0, 0.5, 1.0$ and $3.0$; (b) Samples generated with a fixed noise under different guided weights. The class label is set to be the category of Siamese Cat. Sub-images presented at the same position depict samples with identical random noise and class label, differing only in their guided weights; (c) A curve of FID scores across different guided weights; (d) A curve of Inception scores across different guided weights.
  • Figure 3: The results of density estimation using CDRL for a 2D checkerboard distribution. The number of noise levels in the CDRL is set to be 5. Top: observed samples at each noise level. Middle: density fitted by CDRL at each noise level. Bottom: generated samples at each noise level.
  • Figure 4: Results of attribute-compositional generation on CelebA ($64\times64$) with guided weight $w=3$. Left: generated samples under different attribute compositions. Right: control attributes ("$\surd$", "$\times$" and "-" indicate "True", "False" and "No Control" respectively).
  • Figure 5: Noise schedule. The green line represents the noise schedule used by DRL GaoSPWK21 while the red line depicts the noise schedule employed by our CDRL.
  • ...and 10 more figures