Table of Contents
Fetching ...

Multi-scale Generative Modeling for Fast Sampling

Xiongye Xiao, Shixuan Li, Luzhe Huang, Gengshuo Liu, Trung-Kien Nguyen, Yi Huang, Di Chang, Mykel J. Kochenderfer, Paul Bogdan

TL;DR

This work proposes a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands, while utilizing a multi-scale generative adversarial learning for high-frequency bands.

Abstract

While working within the spatial domain can pose problems associated with ill-conditioned scores caused by power-law decay, recent advances in diffusion-based generative models have shown that transitioning to the wavelet domain offers a promising alternative. However, within the wavelet domain, we encounter unique challenges, especially the sparse representation of high-frequency coefficients, which deviates significantly from the Gaussian assumptions in the diffusion process. To this end, we propose a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands. In the wavelet domain, we apply score-based generative modeling with well-conditioned scores for low-frequency bands, while utilizing a multi-scale generative adversarial learning for high-frequency bands. As supported by the theoretical analysis and experimental results, our model significantly improve performance and reduce the number of trainable parameters, sampling steps, and time.

Multi-scale Generative Modeling for Fast Sampling

TL;DR

This work proposes a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands, while utilizing a multi-scale generative adversarial learning for high-frequency bands.

Abstract

While working within the spatial domain can pose problems associated with ill-conditioned scores caused by power-law decay, recent advances in diffusion-based generative models have shown that transitioning to the wavelet domain offers a promising alternative. However, within the wavelet domain, we encounter unique challenges, especially the sparse representation of high-frequency coefficients, which deviates significantly from the Gaussian assumptions in the diffusion process. To this end, we propose a multi-scale generative modeling in the wavelet domain that employs distinct strategies for handling low and high-frequency bands. In the wavelet domain, we apply score-based generative modeling with well-conditioned scores for low-frequency bands, while utilizing a multi-scale generative adversarial learning for high-frequency bands. As supported by the theoretical analysis and experimental results, our model significantly improve performance and reduce the number of trainable parameters, sampling steps, and time.

Paper Structure

This paper contains 47 sections, 5 theorems, 67 equations, 19 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Suppose the Gaussian distribution $p = \mathcal{N}(0, \Sigma)$ and distribution ${\tilde{p}}_0$ from time reversed SDE, the Kullback-Leibler divergence between $p$ and $p_{\tilde{0}}$ relates to the covariance matrix $\Sigma$ as: with: where $f(t) = t - \log(1 + t)$ and $d$ is the dimension of $\Sigma$, $\mathrm{Tr}\left(\Sigma\right)=d$.

Figures (19)

  • Figure 1: Diffusion trajectories of the wavelet coefficients. Notice that the high-frequency components (LH,HL,HH) are overwhelmed by noise at an earlier stage marked by the green line. At the same time, the low-frequency component (LL) degrades more slowly.
  • Figure 2: The framework of WMGM, featuring a hierachical wavelet transform (WT) and inverse wavelet transform (IWT). Reverse diffusion in wavelet domain is utilized for generating low-frequency coefficients at each scale. Subsequently, the MSAL is applied to learn and generate the high-frequency coefficients from these low-frequency components.
  • Figure 3: Generated images of SGM, WSGM and our method on CelebA-HQ datasets with only 16 discretization steps.
  • Figure 4: Performances of SGM, WSGM and our method on AFHQ-Cat and CelebA-HQ datasets w.r.t. various total sampling steps.
  • Figure 5: KL divergence of sample distribution (scale = 0) and LL coefficient distributions (scale = 1,2) to standard Gaussian distribution. Images were downsampled to size $L\times L$ before wavelet decomposition.
  • ...and 14 more figures

Theorems & Definitions (9)

  • Theorem 1
  • Proposition 1
  • Definition 2: Forward Process in the Wavelet Domain
  • Definition 3: Reverse Process in the Wavelet Domain
  • Definition 4: Reverse Process at the Coarsest Level in the Wavelet Domain
  • Theorem 5
  • Theorem 6
  • Proposition 2
  • proof