U-Turn Diffusion
Hamidreza Behjoo, Michael Chertkov
TL;DR
This work analyzes how Ground Truth information is encoded in the Score Function of Score-Based Diffusion models and introduces U-Turn diffusion to shorten both forward and reverse dynamics while preserving detailed balance. By leveraging pre-trained score functions on ImageNet and CIFAR-10, the authors identify critical phase-transition times, Memorization Time $T_m$ and Speciation Time $T_s$, that govern when generated samples diverge from the initial GT and when they begin representing new classes. They develop a Gaussian-turn (G-Turn) framework and a set of quantitative tests (KS Gaussianity, SF-norm, U-Turn auto-correlation) to characterize these transitions, and they extend the approach to deterministic samplers. Across experiments, U-Turn demonstrates potential for more efficient sampling and provides new insights into non-Gaussianity, non-self-averaging behavior, and regime-dependent linearity of the score function, with practical implications for faster, flexible generative modeling.
Abstract
We investigate diffusion models generating synthetic samples from the probability distribution represented by the Ground Truth (GT) samples. We focus on how GT sample information is encoded in the Score Function (SF), computed (not simulated) from the Wiener-Ito (WI) linear forward process in the artifical time $t\in [0\to \infty]$, and then used as a nonlinear drift in the simulated WI reverse process with $t\in [\infty\to 0]$. We propose U-Turn diffusion, an augmentation of a pre-trained diffusion model, which shortens the forward and reverse processes to $t\in [0\to T_u]$ and $t\in [T_u\to 0]$. The U-Turn reverse process is initialized at $T_u$ with a sample from the probability distribution of the forward process (initialized at $t=0$ with a GT sample) ensuring a detailed balance relation between the shorten forward and reverse processes. Our experiments on the class-conditioned SF of the ImageNet dataset and the multi-class, single SF of the CIFAR-10 dataset reveal a critical Memorization Time $ T_m $, beyond which generated samples diverge from the GT sample used to initialize the U-Turn scheme, and a Speciation Time $ T_s $, where for $ T_u > T_s > T_m $, samples begin representing different classes. We further examine the role of SF non-linearity through a Gaussian Test, comparing empirical and Gaussian-approximated U-Turn auto-correlation functions, and showing that the SF becomes effectively affine for $ t > T_s $, and approximately affine for $t\in [T_m,T_s]$.
