Table of Contents
Fetching ...

Simple ReFlow: Improved Techniques for Fast Flow Models

Beomsu Kim, Yu-Guan Hsieh, Michal Klein, Marco Cuturi, Jong Chul Ye, Bahjat Kawar, James Thornton

TL;DR

This work examines the design space of ReFlow and proposes seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10, AFHQv2, and FFHQ and achieve state-of-the-art FID scores for fast generation via neural ODEs.

Abstract

Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10 $32 \times 32$, AFHQv2 $64 \times 64$, and FFHQ $64 \times 64$. Combining all our techniques, we achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs: $2.23$ / $1.98$ on CIFAR10, $2.30$ / $1.91$ on AFHQv2, $2.84$ / $2.67$ on FFHQ, and $3.49$ / $1.74$ on ImageNet-64, all with merely $9$ neural function evaluations.

Simple ReFlow: Improved Techniques for Fast Flow Models

TL;DR

This work examines the design space of ReFlow and proposes seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10, AFHQv2, and FFHQ and achieve state-of-the-art FID scores for fast generation via neural ODEs.

Abstract

Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10 , AFHQv2 , and FFHQ . Combining all our techniques, we achieve state-of-the-art FID scores (without / with guidance, resp.) for fast generation via neural ODEs: / on CIFAR10, / on AFHQv2, / on FFHQ, and / on ImageNet-64, all with merely neural function evaluations.

Paper Structure

This paper contains 38 sections, 4 theorems, 58 equations, 16 figures, 12 tables.

Key Result

Proposition 1

Let $w(\bm{x}_t,t)$, $d\mathbb{T}(t)$ be positive, and $\phi$ be an invertible linear map. Then, $\theta$ minimizes Eq. (eq:gen_reflow_loss) if and only if it minimizes Eq. (eq:fm_loss).

Figures (16)

  • Figure 1: Min., avg., max. relative losses after training on CIFAR10.
  • Figure 2: Time distribution densities.
  • Figure 3: Comparison of flow matching (FM) with $\ell_{\mathop{\mathrm{MSE}}\nolimits}$ and Pseudo Huber (PH) losses.
  • Figure 4: $\lambda$ ablation.
  • Figure 5: CIFAR10 training.
  • ...and 11 more figures

Theorems & Definitions (6)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof