Table of Contents
Fetching ...

Towards understanding Diffusion Models (on Graphs)

Solveig Klepper

TL;DR

This work provides an overview of the most prominent approaches to diffusion models, drawing attention to their striking analogies -- namely, how seemingly diverse methodologies converge to a similar mathematical formulation of the core problem.

Abstract

Diffusion models have emerged from various theoretical and methodological perspectives, each offering unique insights into their underlying principles. In this work, we provide an overview of the most prominent approaches, drawing attention to their striking analogies -- namely, how seemingly diverse methodologies converge to a similar mathematical formulation of the core problem. While our ultimate goal is to understand these models in the context of graphs, we begin by conducting experiments in a simpler setting to build foundational insights. Through an empirical investigation of different diffusion and sampling techniques, we explore three critical questions: (1) What role does noise play in these models? (2) How significantly does the choice of the sampling method affect outcomes? (3) What function is the neural network approximating, and is high complexity necessary for optimal performance? Our findings aim to enhance the understanding of diffusion models and in the long run their application in graph machine learning.

Towards understanding Diffusion Models (on Graphs)

TL;DR

This work provides an overview of the most prominent approaches to diffusion models, drawing attention to their striking analogies -- namely, how seemingly diverse methodologies converge to a similar mathematical formulation of the core problem.

Abstract

Diffusion models have emerged from various theoretical and methodological perspectives, each offering unique insights into their underlying principles. In this work, we provide an overview of the most prominent approaches, drawing attention to their striking analogies -- namely, how seemingly diverse methodologies converge to a similar mathematical formulation of the core problem. While our ultimate goal is to understand these models in the context of graphs, we begin by conducting experiments in a simpler setting to build foundational insights. Through an empirical investigation of different diffusion and sampling techniques, we explore three critical questions: (1) What role does noise play in these models? (2) How significantly does the choice of the sampling method affect outcomes? (3) What function is the neural network approximating, and is high complexity necessary for optimal performance? Our findings aim to enhance the understanding of diffusion models and in the long run their application in graph machine learning.
Paper Structure (19 sections, 17 equations, 11 figures)

This paper contains 19 sections, 17 equations, 11 figures.

Figures (11)

  • Figure 1: General idea of denoising diffusion models. The forward process is modelled by a Markov process. The reverse process is unknown and needs to be approximated; this is usually done with a neural network.
  • Figure 2: Reparametrization in sampling. The model does not predict the previous data point but the noise in relation to the clean image. The predicted noise and the diffusion process are used to interpolate between the clean image $x_0$ and the input $x_t$ to sample $x_{t-1}$ with the desired step size. $a_t$ and $b_t$ are functions of $t$ that encoder the stepsize and manage the interpolation between the clean and the noisy image.
  • Figure 3: Ground truth data distribution used to sample training points. The left figure shows the density, and the right figure shows the log-likelihood. The arrows indicate the direction of the score $\nabla_x \log p(x)$.
  • Figure 4: Visualization of the three investigated sampling methods. Red indicates the part that the model predicts.
  • Figure 5: The noise schedule makes a difference. For linear diffusion, most information is lost in the early time steps, and later steps hold little to no information about either the original distribution or the diffusion process. Controlled by $\Bar{\alpha}$, the information in the cosine diffusion process degrades slower, so later steps still hold valuable transition information for the training. Visualizations of the diffusion process in Figures (d) and (e) show timesteps t = 0, 27, 54, 81, 99 from left to right.
  • ...and 6 more figures