Table of Contents
Fetching ...

Latent Iterative Refinement Flow: A Geometric Constrained Approach for Few-Shot Generation

Songtao Li, Tianqi Hou, Zhenyu Liao, Ting Gao

TL;DR

This work identifies velocity field collapse as a core bottleneck when training diffusion/flow-matching models with limited data, explaining memorization as trajectories collapsing to point attractors around training samples. It introduces Latent Iterative Refinement Flow (LIRF), which embeds data in a semantically aligned latent space (via DiNO-VAE) and iteratively densifies the latent manifold through a generation–correction–augmentation loop, with a locally contracting correction operator to stabilize learning. The authors provide theoretical convergence guarantees for the manifold densification procedure and demonstrate improved diversity and recall over strong baselines on FFHQ subsets and Low-Shot datasets, while maintaining good generative fidelity. The approach emphasizes geometry-aware latent-space dynamics to enhance data efficiency in diffusion-based generation, offering practical gains for data-scarce domains and highlighting avenues for higher-resolution and broader-domain extensions.

Abstract

Diffusion and flow-matching models trained with limited data often tend to memorize the training data instead of generalization, leading to severely reduced diversity. In this paper, we provide a dynamical perspective and identify this ``collapse-to-memorization'' phenomenon as a consequence of the \emph{velocity field collapse}, where the learned field degenerates into isolated point attractors and trap the sampling trajectories. Inspired by this novel view, we introduce \textbf{{\BLUE L}atent {\BLUE I}terative {\BLUE R}efinement {\BLUE F}low ({\BLUE LIRF})}, a geometry-aware framework for from-scratch training of diffusion models in the limited-data regime. By exploiting the intrinsic geometry of a semantically aligned latent space, LIRF progressively densifies the training data manifold via a \emph{generation--correction--augmentation} closed loop, thereby effectively resolving the velocity field collapse. Theoretical guarantee on the convergence of this manifold densification procedure is also provided. Experiments on FFHQ subsets and Low-Shot datasets demonstrate the advantageous performance of LIRF over existing diffusion models for limited-data generation, achieving significantly higher diversity and recall, with comparably good generative performance.

Latent Iterative Refinement Flow: A Geometric Constrained Approach for Few-Shot Generation

TL;DR

This work identifies velocity field collapse as a core bottleneck when training diffusion/flow-matching models with limited data, explaining memorization as trajectories collapsing to point attractors around training samples. It introduces Latent Iterative Refinement Flow (LIRF), which embeds data in a semantically aligned latent space (via DiNO-VAE) and iteratively densifies the latent manifold through a generation–correction–augmentation loop, with a locally contracting correction operator to stabilize learning. The authors provide theoretical convergence guarantees for the manifold densification procedure and demonstrate improved diversity and recall over strong baselines on FFHQ subsets and Low-Shot datasets, while maintaining good generative fidelity. The approach emphasizes geometry-aware latent-space dynamics to enhance data efficiency in diffusion-based generation, offering practical gains for data-scarce domains and highlighting avenues for higher-resolution and broader-domain extensions.

Abstract

Diffusion and flow-matching models trained with limited data often tend to memorize the training data instead of generalization, leading to severely reduced diversity. In this paper, we provide a dynamical perspective and identify this ``collapse-to-memorization'' phenomenon as a consequence of the \emph{velocity field collapse}, where the learned field degenerates into isolated point attractors and trap the sampling trajectories. Inspired by this novel view, we introduce \textbf{{\BLUE L}atent {\BLUE I}terative {\BLUE R}efinement {\BLUE F}low ({\BLUE LIRF})}, a geometry-aware framework for from-scratch training of diffusion models in the limited-data regime. By exploiting the intrinsic geometry of a semantically aligned latent space, LIRF progressively densifies the training data manifold via a \emph{generation--correction--augmentation} closed loop, thereby effectively resolving the velocity field collapse. Theoretical guarantee on the convergence of this manifold densification procedure is also provided. Experiments on FFHQ subsets and Low-Shot datasets demonstrate the advantageous performance of LIRF over existing diffusion models for limited-data generation, achieving significantly higher diversity and recall, with comparably good generative performance.

Paper Structure

This paper contains 45 sections, 6 theorems, 54 equations, 12 figures, 5 tables.

Key Result

Proposition 3.2

Conditioning on a fixed neighborhood $\mathcal{N}_k$ and associated weights $\{w_j\}$ (and hence a fixed local reference point $p \triangleq z_{\mathop{\mathrm{ref}}\nolimits}\neq 0$), the correction operator $\mathcal{C}(\cdot)$ in Definition def:correction_operator exhibits a local Euclidean contr A constructive bound on $\kappa$, together with a detailed proof, is provided in app:contraction_pr

Figures (12)

  • Figure 1: Illustration of velocity field collapse for vanilla flow matching, ADA, and the proposed LIRF on 2D spiral data. As the number of training samples decreases, vanilla flow matching degrades towards "memorization" of isolated training samples, while ADA produces fragmented generations with disconnected regions. In contrast, LIRF preserves a continuous generative distribution that remains aligned with the underlying spiral manifold.
  • Figure 2: Visualization of learned velocity fields in the setting of \ref{['fig:spiral_compaire']}. Arrows indicate the direction and magnitude of the learned velocity field, while red dots denote training samples.
  • Figure 3: Overview of Latent Iterative Refinement Flow (LIRF). (1) Training images are encoded by a frozen DiNO-VAE into a semantically aligned latent space, forming the current training set $Z^{(r)}$. (2) A flow-matching model is trained on $Z^{(r)}$ and periodically generates a candidate set $\tilde{Z}^{(r)}$ via ODE sampling. (3) Each candidate $\tilde{z}$ is refined using a geometric correction operator $\mathcal{C}(\cdot)$, which pulls it towards a local reference point $z_{\mathop{\mathrm{ref}}\nolimits}$. (4) Corrected samples with small correction magnitude ($\delta(z)=|\tilde{z}-\mathcal{C}(\tilde{z})|_2\leq\tau$) are admitted into $\mathcal{A}^{(r)}$ and merged into the training set to form $Z^{(r+1)}$. This generation--correction--augmentation loop progressively densifies the latent manifold and mitigates velocity field collapse under data scarcity.
  • Figure 4: Top/Third Rows (SD-VAE): Interpolation trajectories traverses semantically implausible region, resulting in severe ghosting artifacts, blurriness, and incoherent facial structures. Second/Bottom Rows (DiNO-VAE): Intermediate samples remain sharp and structurally coherent, confirming that the latent space exhibits approximate semantic convexity under local interpolation.
  • Figure 5: Sensitivity analysis of correction strength $\lambda$ and admission threshold $\tau$ on FFHQ-100. The curves show FID for different fixed values of $\lambda$ under varying $\tau$. Shaded regions indicate the standard deviation over 5 runs. The dashed line corresponds to the adaptive schedule, where $\lambda$ is linearly decayed from $0.8$ to $0.2$.
  • ...and 7 more figures

Theorems & Definitions (11)

  • Definition 3.1: Correction operator
  • Proposition 3.2: Local Euclidean contraction of $\mathcal{C(\cdot)}$
  • Theorem 3.3: Convergence of manifold densification
  • Lemma 1.1: Angular shrinkage of SLERP
  • proof
  • Lemma 1.2: Cosine-gap contraction on a bounded angle range
  • proof
  • Proposition 1.3: Local Euclidean contraction of $\mathcal{C}(\cdot)$
  • proof
  • Theorem 1.4: Convergence of manifold densification
  • ...and 1 more