Phase-aware Training Schedule Simplifies Learning in Flow-Based Generative Models
Santiago Aranguri, Francesco Insulla
TL;DR
The paper addresses how diffusion-based samplers learn features at different scales in a high-dimensional, unbalanced two-mode Gaussian mixture. By introducing a time-dilated training schedule, it reveals a two-phase learning process: first the model learns mode probabilities, then it learns variances, with the velocity field simplifying to a small subspace in each phase. The authors provide sharp asymptotic characterizations of the learned parameters, show that the generated samples reflect both the mixing probability $p$ and the variance $\sigma^2$, and demonstrate practical utility by guiding feature-specific training on MNIST through adaptive time intervals. This work offers a principled approach to schedule design in diffusion-like models and suggests concrete strategies for efficient, feature-aware training in real data applications.
Abstract
We analyze the training of a two-layer autoencoder used to parameterize a flow-based generative model for sampling from a high-dimensional Gaussian mixture. Previous work shows that the phase where the relative probability between the modes is learned disappears as the dimension goes to infinity without an appropriate time schedule. We introduce a time dilation that solves this problem. This enables us to characterize the learned velocity field, finding a first phase where the probability of each mode is learned and a second phase where the variance of each mode is learned. We find that the autoencoder representing the velocity field learns to simplify by estimating only the parameters relevant to each phase. Turning to real data, we propose a method that, for a given feature, finds intervals of time where training improves accuracy the most on that feature. Since practitioners take a uniform distribution over training times, our method enables more efficient training. We provide preliminary experiments validating this approach.
