Neural Diffusion Processes

Vincent Dutordoir; Alan Saul; Zoubin Ghahramani; Fergus Simpson

Neural Diffusion Processes

Vincent Dutordoir, Alan Saul, Zoubin Ghahramani, Fergus Simpson

TL;DR

<3-5 sentence high-level summary> Neural Diffusion Processes (NDPs) extend probabilistic diffusion to function spaces by diffusing over finite marginals and enforcing stochastic-process properties through a bi-dimensional attention block. This yields a flexible, non-Gaussian prior over functions that can emulate GP posteriors, marginalise hyperparameters, and perform conditional sampling with context data, while excelling in tasks like image regression and global optimisation. Empirically, NDPs match or surpass Neural Processes on various benchmarks and approach GP performance in regression and Bayesian optimisation, all while enabling novel joint modeling of inputs and outputs. The work highlights a practical, scalable approach to learning distributions over functions with strong theoretical properties and broad downstream applicability.

Abstract

Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite marginals. By introducing a custom attention block we are able to incorporate properties of stochastic processes, such as exchangeability, directly into the NDP's architecture. We empirically show that NDPs can capture functional distributions close to the true Bayesian posterior, demonstrating that they can successfully emulate the behaviour of Gaussian processes and surpass the performance of neural processes. NDPs enable a variety of downstream tasks, including regression, implicit hyperparameter marginalisation, non-Gaussian posterior prediction and global optimisation.

Neural Diffusion Processes

TL;DR

Abstract

Paper Structure (52 sections, 11 theorems, 43 equations, 13 figures, 3 tables, 3 algorithms)

This paper contains 52 sections, 11 theorems, 43 equations, 13 figures, 3 tables, 3 algorithms.

Introduction
Contributions
Background
Gaussian Processes
Neural Processes and Meta-Learning Functions
Neural Processes and Meta-Learning Functions
Probabilistic Denoising Diffusion Models
Neural Diffusion Processes
Data, Forward and Reverse Process
Data
Forward process
Backward kernel
Objective
Prior and Conditional Sampling
Prior
...and 37 more sections

Key Result

Proposition 4.1

Let ${\Pi}_{N}$ and ${\Pi_D}$ be the set of all permutations of indices $\{1, \ldots, N\}$ and $\{1, \ldots, D\}$, respectively. Let $\v{s} \in \R^{N \times D \times H}$ and $(\pi_n \circ \v{s}) \in \R^{N \times D \times H}$ denote a tensor where the ordering of indices in the first dimension are gi

Figures (13)

Figure 1: Posterior samples conditioned on a context dataset (black dots) for different probabilistic models.
Figure 2: Architecture of the noise prediction model, utilised at each step within the Neural Diffusion Process. The greyed box represents the bi-dimensional attention block, as discussed in Section \ref{['sec:bidim']}.
Figure 3: Hyperparameter marginalisation: Samples from the NDP, conditioned on an increasing number of data points (black dots), are illustrated in the top row. A sample is coloured according to its most likely lengthscale. The bottom row shows a histogram of likely lengthscales from the produced samples. As more data points are provided, the distribution of likely lengthscales converges from the prior over lengthscales to the lengthscale that was used to produce the data ($\ell = 0.3$).
Figure 4: Representing a step function using NDPs. Figure (a) and (b) show samples from the model's prior and conditional distribution, respectively. Figure (c) illustrates the non-Gaussian posterior a NDP can capture, which a GP, by definition, can not do.
Figure 5: NDPs for image regression on MNIST and CelebA ($32\times32$). Figures (a) and (b) show conditional samples where the context datasets are from top to bottom: the upper and left half of the pixels and a random selection of 5% and 10% of the pixels. Figure (c) plots the MSE of the NDP's predictions for an increasing number of context points.
...and 8 more figures

Theorems & Definitions (29)

Proposition 4.1
proof
Proposition 4.2
proof
Definition 3.1
Definition 3.2
Definition 3.3
Definition 3.4
Lemma 3.5
proof
...and 19 more

Neural Diffusion Processes

TL;DR

Abstract

Neural Diffusion Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (29)