Table of Contents
Fetching ...

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis

TL;DR

This paper addresses the inefficiency and degraded multi-step performance of consistency models by introducing Align Your Flow (AYF), a continuous-time flow-map distillation framework. AYF learns flow maps with two continuous-time objectives (AYF-EMD and AYF-LMD), and leverages autoguidance plus lightweight adversarial fine-tuning to boost quality while preserving diversity. The approach yields state-of-the-art few-step generation on ImageNet at 64x64 and 512x512, and extends to text-to-image distillation with LoRA on FLUX. These results demonstrate that flow-map distillation can outperform prior non-adversarial methods in both efficiency and quality, with broad implications for scalable diffusion and flow-based generation.

Abstract

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

Align Your Flow: Scaling Continuous-Time Flow Map Distillation

TL;DR

This paper addresses the inefficiency and degraded multi-step performance of consistency models by introducing Align Your Flow (AYF), a continuous-time flow-map distillation framework. AYF learns flow maps with two continuous-time objectives (AYF-EMD and AYF-LMD), and leverages autoguidance plus lightweight adversarial fine-tuning to boost quality while preserving diversity. The approach yields state-of-the-art few-step generation on ImageNet at 64x64 and 512x512, and extends to text-to-image distillation with LoRA on FLUX. These results demonstrate that flow-map distillation can outperform prior non-adversarial methods in both efficiency and quality, with broad implications for scalable diffusion and flow-based generation.

Abstract

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.

Paper Structure

This paper contains 50 sections, 8 theorems, 66 equations, 28 figures, 5 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $p_{\textrm{data}}({\mathbf{x}}) = \mathcal{N}(\mathbf{0}, c^2{\bm{I}})$ be the data distribution, and let $\mathbf{f}^*({\mathbf{x}}_t, t)$ denote the optimal consistency model. For any $\delta > 0$, there exists a suboptimal consistency model $\mathbf{f}({\mathbf{x}}_t, t)$ such that and there is some integer $N$ for which increasing the number of sampling steps beyond $N$increases the Wass

Figures (28)

  • Figure 1: Four-step samples by our distilled text-conditioned flow map model (prompts in Appendix).
  • Figure 2: Overview of Flow Maps. Flow maps generalize both consistency models and flow matching by connecting any two noise levels $(s, t)$ in a single step. When $s=0$, flow maps reduce to consistency models; when $s \to t$ they're equivalent to standard flow matching models. Our proposed EMD objective (see \ref{['thm:emd_loss']}) similarly generalizes the continuous-time consistency and flow matching losses. For detailed derivations, please see the Appendix.
  • Figure 3: Samples (4 steps): LCM Luo2023LCMLoRAAU, TCD zheng2024trajectory, FLUX.1 [schnell] flux2023, AYF (view zoomed in).
  • Figure 4: Two-step AYF samples on ImageNet512.
  • Figure 5: Wasserstein-2 distance between multi-step consistency samples and data distribution ($c{=}0.5$).
  • ...and 23 more figures

Theorems & Definitions (11)

  • Theorem 3.1: Proof in Appendix
  • Theorem 3.2: Proof in Appendix
  • Theorem 3.3: Proof in Appendix
  • Theorem C.1: Restated from \ref{['thm:cm_multistep_bad']}
  • proof
  • Theorem C.2: Restated from \ref{['thm:emd_loss']}
  • proof
  • Corollary C.3
  • Theorem C.4: Restated from \ref{['thm:lmd_loss']}
  • proof
  • ...and 1 more