Table of Contents
Fetching ...

U-Turn Diffusion

Hamidreza Behjoo, Michael Chertkov

TL;DR

This work analyzes how Ground Truth information is encoded in the Score Function of Score-Based Diffusion models and introduces U-Turn diffusion to shorten both forward and reverse dynamics while preserving detailed balance. By leveraging pre-trained score functions on ImageNet and CIFAR-10, the authors identify critical phase-transition times, Memorization Time $T_m$ and Speciation Time $T_s$, that govern when generated samples diverge from the initial GT and when they begin representing new classes. They develop a Gaussian-turn (G-Turn) framework and a set of quantitative tests (KS Gaussianity, SF-norm, U-Turn auto-correlation) to characterize these transitions, and they extend the approach to deterministic samplers. Across experiments, U-Turn demonstrates potential for more efficient sampling and provides new insights into non-Gaussianity, non-self-averaging behavior, and regime-dependent linearity of the score function, with practical implications for faster, flexible generative modeling.

Abstract

We investigate diffusion models generating synthetic samples from the probability distribution represented by the Ground Truth (GT) samples. We focus on how GT sample information is encoded in the Score Function (SF), computed (not simulated) from the Wiener-Ito (WI) linear forward process in the artifical time $t\in [0\to \infty]$, and then used as a nonlinear drift in the simulated WI reverse process with $t\in [\infty\to 0]$. We propose U-Turn diffusion, an augmentation of a pre-trained diffusion model, which shortens the forward and reverse processes to $t\in [0\to T_u]$ and $t\in [T_u\to 0]$. The U-Turn reverse process is initialized at $T_u$ with a sample from the probability distribution of the forward process (initialized at $t=0$ with a GT sample) ensuring a detailed balance relation between the shorten forward and reverse processes. Our experiments on the class-conditioned SF of the ImageNet dataset and the multi-class, single SF of the CIFAR-10 dataset reveal a critical Memorization Time $ T_m $, beyond which generated samples diverge from the GT sample used to initialize the U-Turn scheme, and a Speciation Time $ T_s $, where for $ T_u > T_s > T_m $, samples begin representing different classes. We further examine the role of SF non-linearity through a Gaussian Test, comparing empirical and Gaussian-approximated U-Turn auto-correlation functions, and showing that the SF becomes effectively affine for $ t > T_s $, and approximately affine for $t\in [T_m,T_s]$.

U-Turn Diffusion

TL;DR

This work analyzes how Ground Truth information is encoded in the Score Function of Score-Based Diffusion models and introduces U-Turn diffusion to shorten both forward and reverse dynamics while preserving detailed balance. By leveraging pre-trained score functions on ImageNet and CIFAR-10, the authors identify critical phase-transition times, Memorization Time and Speciation Time , that govern when generated samples diverge from the initial GT and when they begin representing new classes. They develop a Gaussian-turn (G-Turn) framework and a set of quantitative tests (KS Gaussianity, SF-norm, U-Turn auto-correlation) to characterize these transitions, and they extend the approach to deterministic samplers. Across experiments, U-Turn demonstrates potential for more efficient sampling and provides new insights into non-Gaussianity, non-self-averaging behavior, and regime-dependent linearity of the score function, with practical implications for faster, flexible generative modeling.

Abstract

We investigate diffusion models generating synthetic samples from the probability distribution represented by the Ground Truth (GT) samples. We focus on how GT sample information is encoded in the Score Function (SF), computed (not simulated) from the Wiener-Ito (WI) linear forward process in the artifical time , and then used as a nonlinear drift in the simulated WI reverse process with . We propose U-Turn diffusion, an augmentation of a pre-trained diffusion model, which shortens the forward and reverse processes to and . The U-Turn reverse process is initialized at with a sample from the probability distribution of the forward process (initialized at with a GT sample) ensuring a detailed balance relation between the shorten forward and reverse processes. Our experiments on the class-conditioned SF of the ImageNet dataset and the multi-class, single SF of the CIFAR-10 dataset reveal a critical Memorization Time , beyond which generated samples diverge from the GT sample used to initialize the U-Turn scheme, and a Speciation Time , where for , samples begin representing different classes. We further examine the role of SF non-linearity through a Gaussian Test, comparing empirical and Gaussian-approximated U-Turn auto-correlation functions, and showing that the SF becomes effectively affine for , and approximately affine for .
Paper Structure (26 sections, 21 equations, 16 figures, 1 table, 2 algorithms)

This paper contains 26 sections, 21 equations, 16 figures, 1 table, 2 algorithms.

Figures (16)

  • Figure 1: Illustration of the U-Turn concept for (a) ImageNet and (b) CIFAR-10. Our analysis, based on newly introduced tests, reveals that making the U-Turn earlier is beneficial, but not too early. The dark-yellow region indicates a small vicinity near the origin where the approximation of the score function by a NN is crucial to avoid memorization (i.e., generation of ground truth samples). The pink regions mark the range where memorization transitions, $T_m$, occur. These transitions are observed in both single-class (ImageNet) and multi-class settings. In the multi-class case (with a single score function for the entire dataset), we also observe the speciation transition, $T_s$, previously reported in biroli_dynamical_2024 and depicted in light blue in the right figure. This schematic illustration emphasizes an important observation of this work: both $T_m$ and $T_s$ are not self-averaged. Instead, they fluctuate across GT samples and even between different realizations of the forward process -- thus resulting in ranges (distributions).
  • Figure 2: Kolmogorov-Smirnov (KS) test applied to the ImageNet dataset. Different colors represent different classes/labels: 008 (hen), 950 (orange), 698 (palace), and 762 (restaurant, eating house, eatery).
  • Figure 3: Average of the normalized score function 2-norm test for ImageNet. Consistent with Fig. \ref{['fig:ks_test_many_class_ImageNet']}, different colors represent different classes/labels.
  • Figure 4: Results of running the reverse process with different initializations: Gaussian ($x_T \sim \mathcal{N}(0,I)$), Uniform ($x_T \sim \text{Uniform}[-1,1]$), Zero ($x_T = 0$), GT Data ($x_T \sim p_{\text{data}}$), and Bernoulli ($x_T \sim p_{\text{Bernoulli}}$).
  • Figure 5: ImageNet Visualization: U-Turn at different times $T_u$ with inputs of the forward and reverse processes conditioned to the same class.
  • ...and 11 more figures

Theorems & Definitions (1)

  • Remark