Latent Iterative Refinement Flow: A Geometric Constrained Approach for Few-Shot Generation
Songtao Li, Tianqi Hou, Zhenyu Liao, Ting Gao
TL;DR
This work identifies velocity field collapse as a core bottleneck when training diffusion/flow-matching models with limited data, explaining memorization as trajectories collapsing to point attractors around training samples. It introduces Latent Iterative Refinement Flow (LIRF), which embeds data in a semantically aligned latent space (via DiNO-VAE) and iteratively densifies the latent manifold through a generation–correction–augmentation loop, with a locally contracting correction operator to stabilize learning. The authors provide theoretical convergence guarantees for the manifold densification procedure and demonstrate improved diversity and recall over strong baselines on FFHQ subsets and Low-Shot datasets, while maintaining good generative fidelity. The approach emphasizes geometry-aware latent-space dynamics to enhance data efficiency in diffusion-based generation, offering practical gains for data-scarce domains and highlighting avenues for higher-resolution and broader-domain extensions.
Abstract
Diffusion and flow-matching models trained with limited data often tend to memorize the training data instead of generalization, leading to severely reduced diversity. In this paper, we provide a dynamical perspective and identify this ``collapse-to-memorization'' phenomenon as a consequence of the \emph{velocity field collapse}, where the learned field degenerates into isolated point attractors and trap the sampling trajectories. Inspired by this novel view, we introduce \textbf{{\BLUE L}atent {\BLUE I}terative {\BLUE R}efinement {\BLUE F}low ({\BLUE LIRF})}, a geometry-aware framework for from-scratch training of diffusion models in the limited-data regime. By exploiting the intrinsic geometry of a semantically aligned latent space, LIRF progressively densifies the training data manifold via a \emph{generation--correction--augmentation} closed loop, thereby effectively resolving the velocity field collapse. Theoretical guarantee on the convergence of this manifold densification procedure is also provided. Experiments on FFHQ subsets and Low-Shot datasets demonstrate the advantageous performance of LIRF over existing diffusion models for limited-data generation, achieving significantly higher diversity and recall, with comparably good generative performance.
