Table of Contents
Fetching ...

Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models

Santiago Aranguri, Francesco Insulla

TL;DR

The paper addresses how diffusion-based samplers learn features at different scales in a high-dimensional, unbalanced two-mode Gaussian mixture. By introducing a time-dilated training schedule, it reveals a two-phase learning process: first the model learns mode probabilities, then it learns variances, with the velocity field simplifying to a small subspace in each phase. The authors provide sharp asymptotic characterizations of the learned parameters, show that the generated samples reflect both the mixing probability $p$ and the variance $\sigma^2$, and demonstrate practical utility by guiding feature-specific training on MNIST through adaptive time intervals. This work offers a principled approach to schedule design in diffusion-like models and suggests concrete strategies for efficient, feature-aware training in real data applications.

Abstract

We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension goes to infinity without an appropriate time schedule. We introduce a time dilation that solves this problem. This enables us to characterize the learned velocity field, finding a first phase where the probability of each mode is learned and a second phase where the variance of each mode is learned. We find that the autoencoder representing the velocity field learns to simplify by estimating only the parameters relevant to each phase. Turning to real data, we propose a method that, for a given feature, finds intervals of time where training improves accuracy the most on that feature. Since practitioners take a uniform distribution over training times, our method enables more efficient training. We provide preliminary experiments validating this approach.

Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models

TL;DR

The paper addresses how diffusion-based samplers learn features at different scales in a high-dimensional, unbalanced two-mode Gaussian mixture. By introducing a time-dilated training schedule, it reveals a two-phase learning process: first the model learns mode probabilities, then it learns variances, with the velocity field simplifying to a small subspace in each phase. The authors provide sharp asymptotic characterizations of the learned parameters, show that the generated samples reflect both the mixing probability and the variance , and demonstrate practical utility by guiding feature-specific training on MNIST through adaptive time intervals. This work offers a principled approach to schedule design in diffusion-like models and suggests concrete strategies for efficient, feature-aware training in real data applications.

Abstract

We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension goes to infinity without an appropriate time schedule. We introduce a time dilation that solves this problem. This enables us to characterize the learned velocity field, finding a first phase where the probability of each mode is learned and a second phase where the variance of each mode is learned. We find that the autoencoder representing the velocity field learns to simplify by estimating only the parameters relevant to each phase. Turning to real data, we propose a method that, for a given feature, finds intervals of time where training improves accuracy the most on that feature. Since practitioners take a uniform distribution over training times, our method enables more efficient training. We provide preliminary experiments validating this approach.

Paper Structure

This paper contains 23 sections, 15 theorems, 115 equations, 3 figures.

Key Result

Proposition 1

Let $X_t$ be the solution to the probability flow ODE from equation eq:ode:gen:1 with $\alpha_t=1-\tau_t$ and $\beta_t = \tau_t$ where $\tau_t$ is defined in equation eq:time_dil. Then for $t\in[0,2]$ we have where $\sigma_t$ is characterized below. We further have the following phases

Figures (3)

  • Figure 1: We learn the parameters from equation \ref{['eq:dae']} for different choices of interpolant. In all experiments, we take $100$ discretization points, train for $5000$ epochs, with $n=128,$$d=5000,$ and $p=.8.$ We then run the probability flow ODE with the learned parameters for $K=2000$ realizations and estimate $\mathbb{P}(M_t>0)=p$ with $M_t=\mu\cdot X_t/d.$ For the non-dilated interpolant in blue, we use $\alpha_t=1-t,\beta_t=t$. We predict the speciation to happen near $t=1/\sqrt{5000}\approx .014$ as confirmed by the experiment since most of the speciation occurs at the first two ODE steps. For the dilated interpolant in orange, we use $\alpha_t=1-\tau_t,\beta_t=\tau_t,\kappa=4.$ We see the dilated interpolant estimates $p=.8$ much better than the non-dilated one.
  • Figure 2: For $t_0\in [0.2, 0.65],$ we plot the proportion of $0$s that we get by doing the U-Turn at time $t_0$ starting from either $0$ or $1$ at time $t=1.$ On dashed green, we plot $y=.882$ which is the estimated proportion of $0$s that the diffusion model generates starting from noise.
  • Figure 3: Non-cherry-picked samples from the three generative models considered. (a) Samples from the VP SDE, where the times for training are drawn with probability $1/2$ uniformly from $[.2, .6]$ and with probability $1/2$ uniformly outside. (b) Same as left panel except that with probability $1/2$ training times are sampled from $[.3, .5].$(c) Samples from the VP SDE with training times that are uniform in $[0, 1].$

Theorems & Definitions (20)

  • Proposition 1
  • Corollary 1: Parameters given infinite samples
  • Corollary 2
  • Corollary 3: Parameters given inifite samples
  • Corollary 4
  • Corollary 5
  • Corollary 6: Parameters $p$ and $\sigma^2$ are estimated correctly
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • ...and 10 more