Table of Contents
Fetching ...

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

Chen Chen, Pengsheng Guo, Liangchen Song, Jiasen Lu, Rui Qian, Xinze Wang, Tsu-Jui Fu, Wei Liu, Yinfei Yang, Alex Schwing

TL;DR

CAR-Flow addresses the burden in conditional flow matching by introducing condition-aware reparameterization that applies lightweight, shift-only adjustments to the source and/or target distributions. This alignment reduces the required transport distance for the velocity field and prevents trivial zero-cost collapse modes, leading to faster training and better sample fidelity, as demonstrated on synthetic data and ImageNet-256 where FID improves from 2.07 to 1.68 with minimal parameter overhead. The method yields three practical variants (source-only, target-only, joint), with the joint version performing best, and is compatible with existing backbones such as SiT-XL/2. Overall, CAR-Flow provides a simple, effective plug-in enhancement for large-scale conditional generative modeling by explicitly encoding conditioning into the latent/distribution space rather than solely through the velocity network.

Abstract

Conditional generative modeling aims to learn a conditional data distribution from samples containing data-condition pairs. For this, diffusion and flow-based methods have attained compelling results. These methods use a learned (flow) model to transport an initial standard Gaussian noise that ignores the condition to the conditional data distribution. The model is hence required to learn both mass transport and conditional injection. To ease the demand on the model, we propose Condition-Aware Reparameterization for Flow Matching (CAR-Flow) -- a lightweight, learned shift that conditions the source, the target, or both distributions. By relocating these distributions, CAR-Flow shortens the probability path the model must learn, leading to faster training in practice. On low-dimensional synthetic data, we visualize and quantify the effects of CAR-Flow. On higher-dimensional natural image data (ImageNet-256), equipping SiT-XL/2 with CAR-Flow reduces FID from 2.07 to 1.68, while introducing less than 0.6% additional parameters.

CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching

TL;DR

CAR-Flow addresses the burden in conditional flow matching by introducing condition-aware reparameterization that applies lightweight, shift-only adjustments to the source and/or target distributions. This alignment reduces the required transport distance for the velocity field and prevents trivial zero-cost collapse modes, leading to faster training and better sample fidelity, as demonstrated on synthetic data and ImageNet-256 where FID improves from 2.07 to 1.68 with minimal parameter overhead. The method yields three practical variants (source-only, target-only, joint), with the joint version performing best, and is compatible with existing backbones such as SiT-XL/2. Overall, CAR-Flow provides a simple, effective plug-in enhancement for large-scale conditional generative modeling by explicitly encoding conditioning into the latent/distribution space rather than solely through the velocity network.

Abstract

Conditional generative modeling aims to learn a conditional data distribution from samples containing data-condition pairs. For this, diffusion and flow-based methods have attained compelling results. These methods use a learned (flow) model to transport an initial standard Gaussian noise that ignores the condition to the conditional data distribution. The model is hence required to learn both mass transport and conditional injection. To ease the demand on the model, we propose Condition-Aware Reparameterization for Flow Matching (CAR-Flow) -- a lightweight, learned shift that conditions the source, the target, or both distributions. By relocating these distributions, CAR-Flow shortens the probability path the model must learn, leading to faster training in practice. On low-dimensional synthetic data, we visualize and quantify the effects of CAR-Flow. On higher-dimensional natural image data (ImageNet-256), equipping SiT-XL/2 with CAR-Flow reduces FID from 2.07 to 1.68, while introducing less than 0.6% additional parameters.

Paper Structure

This paper contains 28 sections, 30 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Condition-Aware Reparameterization for Flow Matching (CAR-Flow). Illustration of the push‐forward under standard conditional flow matching (direct mapping $x_{0}\to x_{1}$) versus our Condition‐Aware Reparameterization (CAR-Flow) chain ($x_{0}\to z_{0}\to z_{1}\to x_{1}$). In the standard setting, a condition‐agnostic prior sample (red) is carried by the network’s velocity field directly to each condition‐dependent data manifold (blue), forcing it to juggle long‐range transport and semantic injection at once. CAR-Flow, in contrast, employs lightweight source distribution map$f(\cdot,y)$ and target distribution map$g(\cdot,y)$ to align the source and target distributions to relieve the network of unnecessary transport. During sampling, $x_1$ is obtained via the (approximate) inverse map $g^{-1}(\cdot,y)$.
  • Figure 2: Learned flow trajectories on 1D synthetic data. Each panel shows trajectories from source $x_0$ (bottom) to target $x_1$ (top) for (a) baseline and CAR-Flow variants--(b) source-only, (c) target-only, and (d) joint. Intermediate stages $z_0$ and $z_1$ reflect reparameterized coordinates. Colored densities represent predicted and ground-truth class distributions (red/blue: prediction; magenta/cyan: ground truth). Thin lines illustrate individual sample trajectories between $z_0$ and $z_1$. Dashed vertical lines mark $\pm 3\sigma$ for each shift. The source-only CAR-Flow relocates the source distribution per class, while the target-only variant unifies the trajectory endpoints. The joint variant combines both and achieves the best alignment and flow quality.
  • Figure 3: Comparison of convergence and learned shifts. (a) shows the Wasserstein distance between predicted and ground‐truth distributions in symlog-scale. Joint CAR-Flow achieves both the fastest convergence (b) plots the evolution of the learned shifts $\mu_0$ (top) and $\mu_1$ (bottom) for two classes.
  • Figure 4: Mode collapse diagnostics with scale reparameterization. (a)–(b) Evolution of learned $\sigma$ and validation error. (c)–(d) Learned flows when allowing shift+scale on source vs. target.
  • Figure 5: Convergence on ImageNet $256\times256$: FID vs. training steps. CAR-Flow variants consistently converge faster than the baseline.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Claim 1
  • Claim 2
  • proof : Proof