Preconditioned Score and Flow Matching

Shadab Ahamed; Eshed Gal; Simon Ghyselincks; Md Shahriar Rahim Siddiqui; Moshe Eliasof; Eldad Haber

Preconditioned Score and Flow Matching

Shadab Ahamed, Eshed Gal, Simon Ghyselincks, Md Shahriar Rahim Siddiqui, Moshe Eliasof, Eldad Haber

TL;DR

This work empirically track conditioning diagnostics and distributional metrics and shows that preconditioning consistently yields better-trained models by avoiding suboptimal plateaus, and proposes reversible, label-conditional maps that reshape the geometry of p_t by improving the conditioning of $\Sigma_t$ without altering the underlying generative model.

Abstract

Flow matching and score-based diffusion train vector fields under intermediate distributions $p_t$, whose geometry can strongly affect their optimization. We show that the covariance $Σ_t$ of $p_t$ governs optimization bias: when $Σ_t$ is ill-conditioned, and gradient-based training rapidly fits high-variance directions while systematically under-optimizing low-variance modes, leading to learning that plateaus at suboptimal weights. We formalize this effect in analytically tractable settings and propose reversible, label-conditional \emph{preconditioning} maps that reshape the geometry of $p_t$ by improving the conditioning of $Σ_t$ without altering the underlying generative model. Rather than accelerating early convergence, preconditioning primarily mitigates optimization stagnation by enabling continued progress along previously suppressed directions. Across MNIST latent flow matching, and additional high-resolution datasets, we empirically track conditioning diagnostics and distributional metrics and show that preconditioning consistently yields better-trained models by avoiding suboptimal plateaus.

Preconditioned Score and Flow Matching

TL;DR

without altering the underlying generative model.

Abstract

Flow matching and score-based diffusion train vector fields under intermediate distributions

, whose geometry can strongly affect their optimization. We show that the covariance

governs optimization bias: when

is ill-conditioned, and gradient-based training rapidly fits high-variance directions while systematically under-optimizing low-variance modes, leading to learning that plateaus at suboptimal weights. We formalize this effect in analytically tractable settings and propose reversible, label-conditional \emph{preconditioning} maps that reshape the geometry of

by improving the conditioning of

without altering the underlying generative model. Rather than accelerating early convergence, preconditioning primarily mitigates optimization stagnation by enabling continued progress along previously suppressed directions. Across MNIST latent flow matching, and additional high-resolution datasets, we empirically track conditioning diagnostics and distributional metrics and show that preconditioning consistently yields better-trained models by avoiding suboptimal plateaus.

Paper Structure (64 sections, 1 theorem, 112 equations, 12 figures, 5 tables)

This paper contains 64 sections, 1 theorem, 112 equations, 12 figures, 5 tables.

Introduction
Contributions.
Flow and Score Matching: Preliminaries
Score-based Diffusion
Flow Matching
A Unified Optimization Perspective
Gaussian Transport Model: A Solvable Case of Optimization Difficulty
Setup: Ill-conditioned Target Covariance
Analytic Score and Velocity in the Gaussian Case
Velocity.
Key observation.
Linear Regression Objective and Exact Minimizer
Condition Number and Convergence
Alternative Optimization Techniques.
Extension to Gaussian Mixture Model
...and 49 more sections

Key Result

Theorem 4.1

Consider the linear least squares objective where $x \in \mathbb{R}^d$, $y \in \mathbb{R}^d$, and $\Sigma = \mathbb{E}[xx^\top]$ is positive definite with condition number $\kappa(\Sigma) = \lambda_{\max}(\Sigma)/\lambda_{\min}(\Sigma)$. Let $A^\star$ denote the optimal solution and let gradient descent be used to minimize $\mathcal{L}(A)$. T

Figures (12)

Figure 1: (Top) Baseline: Standard flow matching learns a transport path from a Gaussian reference to the complex data distribution $x_1 \in \mathcal{Z}$. When $\mathcal{Z}$ exhibits strong anisotropy, the intermediate distributions $x_t \sim p_t$ inherit ill-conditioned covariances, causing gradient-based optimization to stagnate on low-variance modes. (Bottom) Precondition-then-Match: We introduce a reversible preconditioning operator $\mathcal{P}$ that maps the target data to a latent representation $\tilde{\mathcal{Z}}$ that is closer to a Gaussian distribution. This transformation reshapes the geometry, ensuring that the intermediate distributions $\tilde{x}_t \sim \tilde{p}_t$ remain well-conditioned throughout the transport process, improving optimization speed. Samples are drawn by mapping back to the original data space via $\mathcal{P}^{-1}$, improving convergence without altering the underlying generative model capacity.
Figure 2: Comparison of flow matching without and with preconditioning on a 2D Gaussian transport task. (a) Standard flow matching transports samples from the source Gaussian (blue) to the elongated target Gaussian (green). (b) Preconditioned flow matching transports samples to a whitened target (black triangles) (flow lines not visible), then inverts the preconditioning to obtain the transported samples. (c) Preconditioning avoids early optimization stagnation and achieves a lower Maximum Mean Discrepancy (MMD) between the target and the transported distribution. (d) Condition number of the learned transport map over time for both methods. The preconditioned approach achieves lower condition numbers, improving sample alignment and optimization stability.
Figure 3: Flow matching without preconditioning.(a) Ground-truth Swiss roll samples $x_1 \sim p_1$. (b) Generated samples $\hat{x}_1$ obtained by transporting Gaussian source $z$ through a learned flow. The structural mismatch highlights the challenge of learning complex transport maps without preconditioning.
Figure 4: Flow matching preconditioned by a normalizing flow.(a) The data $x_1$ (Swiss Roll) is pushed forward by normalizing flow to $\tilde{x}_1$. (b) The points $\tilde{x}_1$ computed in (a) are integrated with a learned velocity field to Gaussian points $\tilde{z}$. (c) Random Gaussian points $z$ are pushed backwards using the flow to obtain $\tilde{x}_1$. (d) The points $\tilde{x}_1$ in (c) are pushed backwards using the normalizing flow.
Figure 5: Flow matching preconditioned by a low accuracy flow.(a) The data (Swiss roll) is pushed forward by a preconditioner, low-accuracy flow. (b) The points computed in (a) are integrated with a learned velocity field to a Gaussian. (c) Random Gaussian points are pushed backwards using the high-capacity flow. (d) The points in (c) are pushed backwards using the low-capacity flow.
...and 7 more figures

Theorems & Definitions (1)

Theorem 4.1: Preconditioning Improves Convergence

Preconditioned Score and Flow Matching

TL;DR

Abstract

Preconditioned Score and Flow Matching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)