Training Flow Matching: The Role of Weighting and Parameterization

Anne Gagneux; Ségolène Martin; Rémi Gribonval; Mathurin Massias

Training Flow Matching: The Role of Weighting and Parameterization

Anne Gagneux, Ségolène Martin, Rémi Gribonval, Mathurin Massias

TL;DR

The goal of this systematic numerical study is to disentangle the various factors that matter when training a flow matching model, in order to provide practical insights on design choices.

Abstract

We study the training objectives of denoising-based generative models, with a particular focus on loss weighting and output parameterization, including noise-, clean image-, and velocity-based formulations. Through a systematic numerical study, we analyze how these training choices interact with the intrinsic dimensionality of the data manifold, model architecture, and dataset size. Our experiments span synthetic datasets with controlled geometry as well as image data, and compare training objectives using quantitative metrics for denoising accuracy (PSNR across noise levels) and generative quality (FID). Rather than proposing a new method, our goal is to disentangle the various factors that matter when training a flow matching model, in order to provide practical insights on design choices.

Training Flow Matching: The Role of Weighting and Parameterization

TL;DR

The goal of this systematic numerical study is to disentangle the various factors that matter when training a flow matching model, in order to provide practical insights on design choices.

Abstract

Paper Structure (29 sections, 7 equations, 8 figures, 5 tables)

This paper contains 29 sections, 7 equations, 8 figures, 5 tables.

Introduction
Background and related works
Background on flow matching
Loss weightings
Unifying perspectives
Data-prediction versus velocity-prediction
Generation as denoising
A common ground for comparison
Parametrization classes
Losses and induced weightings
Decoupling weightings and parametrizations
Evaluating the impact of weightings
Metrics
Numerical results
Understanding optimal weightings near $t=1$
...and 14 more sections

Figures (8)

Figure 1: PSNR and FID for the different losses, CIFAR-10. Models that reach the highest PSNR (low difference in PSNR compared to standard FM, $w^t_\mathrm{vel}$) also reach the lowest FID.
Figure 2: PSNR and FID for the different parametrizations, CIFAR-10. Models that reach the highest PSNR (low difference in PSNR compared to standard FM) also reach the lowest FID.
Figure 3: Denoising and generation performance of $\mathcal{C}_{\mathrm{vel}}$ versus $\mathcal{C}_{\mathrm{den}}$ when varying the patch size in the ViT architecture. CIFAR-10 (dimension $3 \times 32^2$). In the notation ViT/$p$, $p$ denotes a patch size $p \times p$. Green indicates that $\mathcal{C}_{\mathrm{den}}$ performs better than $\mathcal{C}_{\mathrm{vel}}$, red indicates the opposite.
Figure 4: 9 samples from the Fourier-32 dataset with controlled manifold dimension $m$.
Figure 5: PSNR gap on the Fourier-32 dataset for intrinsic dimensions $m\in\{4,8,16\}$, across four architectures. Green indicates that $\mathcal{C}_{\mathrm{den}}$ performs better than $\mathcal{C}_{\mathrm{vel}}$, red indicates the opposite.
...and 3 more figures

Training Flow Matching: The Role of Weighting and Parameterization

TL;DR

Abstract

Training Flow Matching: The Role of Weighting and Parameterization

Authors

TL;DR

Abstract

Table of Contents

Figures (8)