Table of Contents
Fetching ...

Risk-Entropic Flow Matching

Vahid R. Ramezani, Benjamin Englard

TL;DR

This paper investigates applying the log-exponential entropic risk transform to Flow Matching (FM) to emphasize rare or high-loss transport directions and better capture multimodal data geometry. By defining a conditional entropic FM objective and deriving its gradient, the authors reveal a Gibbs-tilted mean that yields two interpretable first-order corrections: covariance preconditioning and a skew tail term that biases updates toward minority branches. They further show that a marginal entropic upper bound provides a practical surrogate for optimization. Experiments on a synthetic 2D ring demonstrate that moderate risk levels improve angular concentration, reduce inter-lobe gaps, and better preserve the residual distribution, suggesting a robust way to enhance FM in multi-modal settings with tractable optimization.

Abstract

Tilted (entropic) risk, obtained by applying a log-exponential transform to a base loss, is a well established tool in statistics and machine learning for emphasizing rare or high loss events while retaining a tractable optimization problem. In this work, our aim is to interpret its structure for Flow Matching (FM). FM learns a velocity field that transports samples from a simple source distribution to data by integrating an ODE. In rectified FM, training pairs are obtained by linearly interpolating between a source sample and a data sample, and a neural velocity field is trained to predict the straight line displacement using a mean squared error loss. This squared loss collapses all velocity targets that reach the same space-time point into a single conditional mean, thereby ignoring higher order conditional information (variance, skewness, multi-modality) that encodes fine geometric structure about the data manifold and minority branches. We apply the standard risk-sensitive (log-exponential) transform to the conditional FM loss and show that the resulting tilted risk loss is a natural upper-bound on a meaningful conditional entropic FM objective defined at each space-time point. Furthermore, we show that a small order expansion of the gradient of this conditional entropic objective yields two interpretable first order corrections: covariance preconditioning of the FM residual, and a skew tail term that favors asymmetric or rare branches. On synthetic data designed to probe ambiguity and tails, the resulting risk-sensitive loss improves statistical metrics and recovers geometric structure more faithfully than standard rectified FM.

Risk-Entropic Flow Matching

TL;DR

This paper investigates applying the log-exponential entropic risk transform to Flow Matching (FM) to emphasize rare or high-loss transport directions and better capture multimodal data geometry. By defining a conditional entropic FM objective and deriving its gradient, the authors reveal a Gibbs-tilted mean that yields two interpretable first-order corrections: covariance preconditioning and a skew tail term that biases updates toward minority branches. They further show that a marginal entropic upper bound provides a practical surrogate for optimization. Experiments on a synthetic 2D ring demonstrate that moderate risk levels improve angular concentration, reduce inter-lobe gaps, and better preserve the residual distribution, suggesting a robust way to enhance FM in multi-modal settings with tractable optimization.

Abstract

Tilted (entropic) risk, obtained by applying a log-exponential transform to a base loss, is a well established tool in statistics and machine learning for emphasizing rare or high loss events while retaining a tractable optimization problem. In this work, our aim is to interpret its structure for Flow Matching (FM). FM learns a velocity field that transports samples from a simple source distribution to data by integrating an ODE. In rectified FM, training pairs are obtained by linearly interpolating between a source sample and a data sample, and a neural velocity field is trained to predict the straight line displacement using a mean squared error loss. This squared loss collapses all velocity targets that reach the same space-time point into a single conditional mean, thereby ignoring higher order conditional information (variance, skewness, multi-modality) that encodes fine geometric structure about the data manifold and minority branches. We apply the standard risk-sensitive (log-exponential) transform to the conditional FM loss and show that the resulting tilted risk loss is a natural upper-bound on a meaningful conditional entropic FM objective defined at each space-time point. Furthermore, we show that a small order expansion of the gradient of this conditional entropic objective yields two interpretable first order corrections: covariance preconditioning of the FM residual, and a skew tail term that favors asymmetric or rare branches. On synthetic data designed to probe ambiguity and tails, the resulting risk-sensitive loss improves statistical metrics and recovers geometric structure more faithfully than standard rectified FM.

Paper Structure

This paper contains 33 sections, 3 theorems, 126 equations, 5 figures.

Key Result

Lemma 1

Let with $D_0 \neq 0$. Then

Figures (5)

  • Figure 1: Ground truth straight line transport paths in the synthetic 2D ring experiment. Blue dots mark source points $x_0$ and orange dots their paired targets $x_1$ on the ring; each colored segment shows the straight line trajectory $x_t = (1-t)x_0 + t x_1$. The outer dashed circle indicates the target ring.
  • Figure 2: $\mathrm{RMSE}_\sigma$ vs. $\lambda_{\max}$ (FM baseline (dashed line) vs. scheduled FM-RISK. Lower is better).
  • Figure 3: Relative $\mathrm{RMSE}_\sigma$ improvement $(\mathrm{RMSE}_\sigma^{\text{base}} - \mathrm{RMSE}_\sigma^{\text{risk}}) / \mathrm{RMSE}_\sigma^{\text{base}}$ vs. $\lambda_{\max}$(FM baseline (dashed line) vs. scheduled FM-RISK. Higher is better)
  • Figure 4: Gap violation rate vs. $\lambda_{\max}$ (FM baseline (dashed line) vs. scheduled FM-RISK. Lower is better).
  • Figure 5: Mean Wasserstein--1 distance on absolute angular residuals $\mathrm{W}_1(|r|)$ vs. $\lambda_{\max}$ (FM baseline (dashed line) vs. scheduled FM-RISK. Lower is better).

Theorems & Definitions (5)

  • Lemma 1: First-order ratio expansion
  • proof
  • Theorem 1: First-order expansion of the tilted FM gradient
  • proof : Proof of Theorem
  • Corollary 1