Table of Contents
Fetching ...

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

Arthur Jacot

Abstract

We introduce the Multilevel Euler-Maruyama (ML-EM) method compute solutions of SDEs and ODEs using a range of approximators $f^1,\dots,f^k$ to the drift $f$ with increasing accuracy and computational cost, only requiring a few evaluations of the most accurate $f^k$ and many evaluations of the less costly $f^1,\dots,f^{k-1}$. If the drift lies in the so-called Harder than Monte Carlo (HTMC) regime, i.e. it requires $ε^{-γ}$ compute to be $ε$-approximated for some $γ>2$, then ML-EM $ε$-approximates the solution of the SDE with $ε^{-γ}$ compute, improving over the traditional EM rate of $ε^{-γ-1}$. In other terms it allows us to solve the SDE at the same cost as a single evaluation of the drift. In the context of diffusion models, the different levels $f^{1},\dots,f^{k}$ are obtained by training UNets of increasing sizes, and ML-EM allows us to perform sampling with the equivalent of a single evaluation of the largest UNet. Our numerical experiments confirm our theory: we obtain up to fourfold speedups for image generation on the CelebA dataset downscaled to 64x64, where we measure a $γ\approx2.5$. Given that this is a polynomial speedup, we expect even stronger speedups in practical applications which involve orders of magnitude larger networks.

Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

Abstract

We introduce the Multilevel Euler-Maruyama (ML-EM) method compute solutions of SDEs and ODEs using a range of approximators to the drift with increasing accuracy and computational cost, only requiring a few evaluations of the most accurate and many evaluations of the less costly . If the drift lies in the so-called Harder than Monte Carlo (HTMC) regime, i.e. it requires compute to be -approximated for some , then ML-EM -approximates the solution of the SDE with compute, improving over the traditional EM rate of . In other terms it allows us to solve the SDE at the same cost as a single evaluation of the drift. In the context of diffusion models, the different levels are obtained by training UNets of increasing sizes, and ML-EM allows us to perform sampling with the equivalent of a single evaluation of the largest UNet. Our numerical experiments confirm our theory: we obtain up to fourfold speedups for image generation on the CelebA dataset downscaled to 64x64, where we measure a . Given that this is a polynomial speedup, we expect even stronger speedups in practical applications which involve orders of magnitude larger networks.

Paper Structure

This paper contains 9 sections, 2 theorems, 38 equations, 2 figures.

Key Result

Theorem 1

Under Assumptions assu:scaling_law and assu:regularity_boundedness, for any step size $\eta>0$, error $\epsilon>0$, and time $T=i\eta>0$, if we choose $k_{min}=-\left\lfloor \log_{2}c\right\rfloor$, $k_{max}=-\left\lfloor \log_{2}\left(\frac{2}{L}e^{L(T+\eta)}\epsilon\right)\right\rfloor$ and $p_{k} where

Figures (2)

  • Figure 1: (Left) We compare ML-EM to EM method of generation for DDPM (top) and DDIM (bottom) by plotting the MSE between the generated sample and the 'true' sample (generated with a 1000 steps DDPM/DDIM) with the same initial and Brownian noise, the $x$-axis is the time in seconds required to generate 200 images. Solid lines are the traditional EM method with different network sizes $f^{1},\dots,f^{5}$ and with number of steps ranging from $58$ to $933$. The crosses and dots are the ML-EM method with three networks $\{f^{1},f^{3},f^{5}\}$ and with either fixed probabilities or learned coefficients $\alpha_{k},\beta_{k}$ (see Section \ref{['sec:Numerical-Experiments']}). We add a $\Delta\in\{-3.0,-2.5,\dots,2.5,3.0\}$ to the $\beta_{k}$s and perform 15 trials over the sampling of the Bernoullis RVs (remember that the starting noise and Brownian motion are fixed). The sampling of the Bernoullis that yield the smallest MSE can be memorized, it is therefore okay to compare the straight lines of classical EM to the best trials of ML-EM. (Right) The first 6 generated images for the 'true sample' and four selected instances of EM (A,B) and ML-EM (C,D,E). For DDPMs, ML-EM with learned coefficients clearly outperforms all other methods, requiring in some cases 4 times less compute time than EM to reach the same MSE, or reaching a 10 times smaller MSE at the same compute time. For DDIM the advantage of ML-EM is less clear, but still present. Visually, it appears that the main advantage of ML-EM is that it avoids discolorations/contrast issues present for EM with few steps. Interestingly, DDIM appears to suffer from these discoloration even with 1000 steps.
  • Figure 2: Estimating $\gamma\approx2.5$: We plot the denoising error $\epsilon$ minus $0.15$ against the evaluation time for a range of UNets $f^{1},\dots,f^{5}$. The constant $0.15$ was chosen by hand to approximate the minimal denoising error (it was chosen so that the set of points would align as closely as possible to a line in the log-log plot). We see that on a log-log scale the plot fits well with a $\epsilon\sim t^{-0.4}$ slope, which would correspond to $\gamma=\frac{1}{0.4}=2.5$, which lies in the HTMC regime ($\gamma>2$).

Theorems & Definitions (6)

  • Example 1: Denoising Diffusion Probabilistic Model - DDPM
  • Example 2: Denoising Diffusion Implicit Model - DDIM
  • Remark 1
  • Theorem 1
  • Theorem 2
  • proof