Table of Contents
Fetching ...

Time dependent loss reweighting for flow matching and diffusion models is theoretically justified

Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell

TL;DR

The paper addresses the theoretical justification for time-dependent loss weighting in flow matching and diffusion models by developing a unified framework where the generator is allowed to be time- and state-dependent and where losses can be weighted by general time distributions. It introduces time/state varying linear parametrizations, time- and state-dependent Bregman divergences, and shows that GM and CGM losses align under these conditions, extending to Flow Matching, Diffusion, Jump, and Edit Flow settings. Key contributions include formalizing time-weighted loss equivalence, decomposing losses across multiple linear components, and deriving practical implications for $X_1$- and $x_0$-prediction as well as for EF-based architectures. The results offer a principled basis for common empirical practices of time-dependent loss schedules and enable robust training across a broad family of generative modeling approaches. Overall, the work provides a solid theoretical foundation for using time-dependent loss weighting across GM, Flow Matching, Diffusion, and Edit Flow frameworks, with concrete guidance on parametrizations and loss construction.

Abstract

This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence loss and the linear parameterization of the generator can depend on both the current state $X_t$ and the time $t$, and we show that the expectation over time in the loss can be taken with respect to a broad class of time distributions. We also show this for Edit Flows, which falls outside of Generator Matching. That the loss can depend on $t$ clarifies that time-dependent loss weighting schemes, often used in practice to stabilize training, are theoretically justified when the specific flow or diffusion scheme is a special case of Generator Matching (or Edit Flows). It also often simplifies the construction of $X_1$-predictor schemes, which are sometimes preferred for model-related reasons. We show examples that rely upon the dependence of linear parameterizations, and of the Bregman divergence loss, on $t$ and $X_t$.

Time dependent loss reweighting for flow matching and diffusion models is theoretically justified

TL;DR

The paper addresses the theoretical justification for time-dependent loss weighting in flow matching and diffusion models by developing a unified framework where the generator is allowed to be time- and state-dependent and where losses can be weighted by general time distributions. It introduces time/state varying linear parametrizations, time- and state-dependent Bregman divergences, and shows that GM and CGM losses align under these conditions, extending to Flow Matching, Diffusion, Jump, and Edit Flow settings. Key contributions include formalizing time-weighted loss equivalence, decomposing losses across multiple linear components, and deriving practical implications for - and -prediction as well as for EF-based architectures. The results offer a principled basis for common empirical practices of time-dependent loss schedules and enable robust training across a broad family of generative modeling approaches. Overall, the work provides a solid theoretical foundation for using time-dependent loss weighting across GM, Flow Matching, Diffusion, and Edit Flow frameworks, with concrete guidance on parametrizations and loss construction.

Abstract

This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence loss and the linear parameterization of the generator can depend on both the current state and the time , and we show that the expectation over time in the loss can be taken with respect to a broad class of time distributions. We also show this for Edit Flows, which falls outside of Generator Matching. That the loss can depend on clarifies that time-dependent loss weighting schemes, often used in practice to stabilize training, are theoretically justified when the specific flow or diffusion scheme is a special case of Generator Matching (or Edit Flows). It also often simplifies the construction of -predictor schemes, which are sometimes preferred for model-related reasons. We show examples that rely upon the dependence of linear parameterizations, and of the Bregman divergence loss, on and .

Paper Structure

This paper contains 17 sections, 10 theorems, 112 equations.

Key Result

Lemma 4.7

Let $\mathcal{D}$ be any probability measure on $[0,1]$ dominating the Lebesgue measure, i.e. $\mathcal{D} \gg \lambda$, and let $w(t) \geq 0$ satisfy $\mathbb E_{t\sim \mathcal{D}}[w(t)] < \infty$ and $w(t) >0$ for $\lambda$-almost every $t\in [0,1]$. Then is a probability measure on $[0,1]$ such that $\widetilde{D}\gg \lambda$, and it holds where $K = \int w(t) \mathcal{D}(dt) >0$.

Theorems & Definitions (45)

  • Remark 4.1
  • Remark 4.2
  • Remark 4.3
  • Remark 4.4
  • Example 4.5
  • Example 4.6
  • Lemma 4.7
  • proof
  • Theorem 4.8
  • proof
  • ...and 35 more