Table of Contents
Fetching ...

Nearest Neighbour Score Estimators for Diffusion Generative Models

Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood

TL;DR

This paper tackles high-variance and biased score estimation in diffusion generative models by introducing a nearest-neighbour self-normalized importance sampling (SNIS) estimator. The method builds a tailored proposal based on the $k$ nearest data points to the noisy sample, enabling accurate estimation of the posterior mean $\mathbb{E}[\mathbf{x}|\mathbf{z}, t]$ with reduced variance, and derives bounds on the estimator's covariance. Empirically, the approach yields near-zero bias and variance on CIFAR-10, outperforms single-sample MC, STF, and often EDM in score estimation, and accelerates consistency training with improved FID/IS. Additionally, the estimator can replace a learned score network for PF-ODE sampling, offering a path toward more efficient and flexible diffusion-based generation and potential distillation-based training. The work suggests further avenues such as latent-space representations for neighbours and alternative distance metrics to enhance performance on higher-dimensional data.

Abstract

Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.

Nearest Neighbour Score Estimators for Diffusion Generative Models

TL;DR

This paper tackles high-variance and biased score estimation in diffusion generative models by introducing a nearest-neighbour self-normalized importance sampling (SNIS) estimator. The method builds a tailored proposal based on the nearest data points to the noisy sample, enabling accurate estimation of the posterior mean with reduced variance, and derives bounds on the estimator's covariance. Empirically, the approach yields near-zero bias and variance on CIFAR-10, outperforms single-sample MC, STF, and often EDM in score estimation, and accelerates consistency training with improved FID/IS. Additionally, the estimator can replace a learned score network for PF-ODE sampling, offering a path toward more efficient and flexible diffusion-based generation and potential distillation-based training. The work suggests further avenues such as latent-space representations for neighbours and alternative distance metrics to enhance performance on higher-dimensional data.

Abstract

Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research.
Paper Structure (35 sections, 3 theorems, 49 equations, 12 figures, 4 tables, 2 algorithms)

This paper contains 35 sections, 3 theorems, 49 equations, 12 figures, 4 tables, 2 algorithms.

Key Result

Theorem 4.1

Let $t \in (0, \infty)$, $\hat{\mu}_{\text{MC}}$ be the Monte Carlo estimator defined by eq:simple_mc, and $\hat{\mu}_{\text{KNN}}$ be the estimator described by eq:SNIS_estimator with proposal given by eq:proposal. Then for a fixed $n \geq 1$,

Figures (12)

  • Figure 1: Illustration of our proposal and the posterior across three phases of a toy 1D diffusion process. Left: For small $t$, the posterior probability is concentrated on the single, closest element to $\mathbf{z}$. Middle: For intermediate $t$, we upper bound the posterior probability for non-neighbour elements, resulting in under weighting neighbours. Right: As $t$ becomes large, the posterior approaches a uniform distribution and the proposal matches the posterior well.
  • Figure 2: Visualization of posterior mean estimators on CIFAR-10 images. Top: Noisy $\mathbf{z}$ samples from $p_t(\mathbf{z} | \mathbf{x}^{(i)})$ for increasing noise levels. Second Row: True posterior mean. Third Row: Posterior mean estimates from our estimator. Our estimator nearly perfectly matches the true mean levels. Bottom: Estimated posterior mean from a trained diffusion model. The diffusion model does not match the high frequency features for low noise levels, but has fewer artifacts at the highest noise level than our method.
  • Figure 3: Estimator performance on CIFAR-10. Our estimator reduces bias and variance to near zero, significantly outperforming even a network score estimator. In contrast, STF reduces variance but has significant bias for intermediate $t$.
  • Figure 4: Impact of KNN score estimator performance on Consistency Training. Horizontal lines indicate minimum FID per model, vertical lines indicate when FID improves on baseline iCT. Top: Effect of varying KNN search size. Bottom: Effect of varying estimator sample size.
  • Figure 5: Comparison of sample FID versus $t_{\mathrm{switch}}$ in our hybrid sampling approach. Initializing diffusion sampling with KNN PF-ODE integration ith our score estimator yields identical performance to forward process initialization. For $t<2$, STF is unsuitable for PF-ODE integration.
  • ...and 7 more figures

Theorems & Definitions (8)

  • Theorem 4.1
  • proof
  • Theorem 4.2
  • proof
  • proof
  • Lemma 3.1
  • proof
  • proof