Table of Contents
Fetching ...

Variational Trajectory Optimization of Anisotropic Diffusion Schedules

Pengxi Liu, Zeyu Michael Li, Xiang Cheng

TL;DR

A variational framework for diffusion models with anisotropic noise schedules parameterized by a matrix-valued path that allocates noise across subspaces and develops an efficiently-implementable reverse-ODE solver that is an anisotropic generalization of the second-order Heun discretization algorithm.

Abstract

We introduce a variational framework for diffusion models with anisotropic noise schedules parameterized by a matrix-valued path $M_t(θ)$ that allocates noise across subspaces. Central to our framework is a trajectory-level objective that jointly trains the score network and learns $M_t(θ)$, which encompasses general parameterization classes of matrix-valued noise schedules. We further derive an estimator for the derivative with respect to $θ$ of the score that enables efficient optimization of the $M_t(θ)$ schedule. For inference, we develop an efficiently-implementable reverse-ODE solver that is an anisotropic generalization of the second-order Heun discretization algorithm. Across CIFAR-10, AFHQv2, FFHQ, and ImageNet-64, our method consistently improves upon the baseline EDM model in all NFE regimes. Code is available at https://github.com/lizeyu090312/anisotropic-diffusion-paper.

Variational Trajectory Optimization of Anisotropic Diffusion Schedules

TL;DR

A variational framework for diffusion models with anisotropic noise schedules parameterized by a matrix-valued path that allocates noise across subspaces and develops an efficiently-implementable reverse-ODE solver that is an anisotropic generalization of the second-order Heun discretization algorithm.

Abstract

We introduce a variational framework for diffusion models with anisotropic noise schedules parameterized by a matrix-valued path that allocates noise across subspaces. Central to our framework is a trajectory-level objective that jointly trains the score network and learns , which encompasses general parameterization classes of matrix-valued noise schedules. We further derive an estimator for the derivative with respect to of the score that enables efficient optimization of the schedule. For inference, we develop an efficiently-implementable reverse-ODE solver that is an anisotropic generalization of the second-order Heun discretization algorithm. Across CIFAR-10, AFHQv2, FFHQ, and ImageNet-64, our method consistently improves upon the baseline EDM model in all NFE regimes. Code is available at https://github.com/lizeyu090312/anisotropic-diffusion-paper.
Paper Structure (68 sections, 6 theorems, 72 equations, 8 figures, 1 table)

This paper contains 68 sections, 6 theorems, 72 equations, 8 figures, 1 table.

Key Result

Lemma 3.1

Fix any schedule $M_t(\theta)$ satisfying e:Mt-psd. In the limit of infinite data and model capacity, the minimizer $\phi^*(\theta) = \arg\min_{\phi} L(\theta, \phi)$ satisfies

Figures (8)

  • Figure 1: Illustration of isotropic vs. anisotropic denoising. Top: a standard isotropic sampler denoises all directions uniformly. Bottom: an anisotropic sampler with two DCT subspaces, $V_1$ (low frequency) and $V_2$ (high frequency) (Section \ref{['s:implementation_details']}). Columns show intermediate reconstructions as $t$ decreases. The plot (right) displays learned subspace schedules $g_1(t)$ and $g_2(t)$; the former is denoised more aggressively, so low-frequency structure from $V_1$ emerges earlier, while high-frequency details from $V_2$ emerge later. Illustration only: in practice anisotropic and isotropic reconstruct different images, and the gap between $g_1$ and $g_2$ is typically smaller (see Fig. \ref{['fig:cifar10_schedule']}--\ref{['fig:imagenet_schedule']}).
  • Figure 2: CIFAR-10 learned schedule analysis. (a) PCA-based geometric mean $\sqrt{g^{\mathrm{PCA}}_{1}(t)g^{\mathrm{PCA}}_{2}(t)}$ with a log-linear reference (gray dashed; log $y$-axis). (b) PCA anisotropy ratio $g^{\mathrm{PCA}}_{1}(t)/g^{\mathrm{PCA}}_{2}(t)$. (c) Class-conditional isotropic schedules $g_y(t)/\bar{g}(t)$, where $\bar{g}(t)=(\prod_{y=1}^{C} g_y(t))^{1/C}$ with $C=\#\text{class}=10$. (d) Class-conditional anisotropic schedules over DCT subspaces: $g^{\mathrm{DCT}}_{k,y}(t)/\bar{g}_k(t)$ for $k\in\{1,2\}$ (solid/dashed), where $i$ indexes classes and $\bar{g}_k(t)$ is the geometric mean across classes.
  • Figure 3: AFHQv2 learned schedule analysis. Ratio between the two learned DCT-based schedules $g^{\mathrm{DCT}}_1(t)$ and $g^{\mathrm{DCT}}_2(t)$ over time.
  • Figure 4: FFHQ learned schedule analysis. Ratio between the two learned DCT-based schedules $g^{\mathrm{DCT}}_1(t)$ and $g^{\mathrm{DCT}}_2(t)$ over time.
  • Figure 5: ImageNet-64 learned schedule analysis. (a) Class-conditional isotropic schedules shown as $g_y(t)/\bar{g}(t)$, where $\bar{g}(t)$ denotes the geometric mean across classes. (b) Class-conditional anisotropic schedules summarized by the geometric mean $\sqrt{g^{\mathrm{DCT}}_{1,y}(t)g^{\mathrm{DCT}}_{2,y}(t)}$ and normalized by its class-wise geometric mean. (c) Class-conditional anisotropy ratios $g^{\mathrm{DCT}}_{1,y}/g^{\mathrm{DCT}}_{2,y}$. (d) PCA-based (class-conditional basis) anisotropic schedule ratio $g^{\mathrm{PCA}}_{1,y}(t)/g^{\mathrm{PCA}}_{2,y}(t)$. (e) PCA-based (class-conditional basis) class-conditional anisotropic schedules shown as $\sqrt{g^{\mathrm{PCA}}_{1,y}(t)g^{\mathrm{PCA}}_{2,y}(t)}$ normalized by the geometric mean across classes. (f) PCA-based (class-conditional basis) class-conditional anisotropy ratios $g^{\mathrm{PCA}}_{1,y}(t)/g^{\mathrm{PCA}}_{2,y}(t)$. All ratios are plotted on a logarithmic $y$-axis, and dashed horizontal lines indicate the reference value $1$.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Lemma 3.1: Exact score at optimality
  • Theorem 4.1
  • Corollary 4.2
  • Lemma A.1
  • proof : Proof of Lemma \ref{['l:anisotropic_score']}
  • Lemma A.2
  • proof : Proof of Lemma \ref{['l:weighing_is_geodesic']}
  • Lemma A.3
  • proof