VeCoR - Velocity Contrastive Regularization for Flow Matching
Zong-Wei Hong, Jing-lun Li, Lin-Ze Li, Shen Zhang, Yao Tang
TL;DR
VeCoR introduces Velocity Contrastive Regularization to Flow Matching by adding a bidirectional attract–repel signal in velocity space. Negative velocity candidates, generated via augmentation-like perturbations across image, latent, and velocity domains, regularize trajectory evolution and reduce off-manifold drift. Empirically, VeCoR yields significant FID improvements and faster convergence on ImageNet-1K 256×256 and MS-COCO text-to-image tasks, with strong gains for lightweight models and low-function-evaluation budgets. The approach remains plug-and-play, data-efficient, and requires no architectural changes, offering a practical pathway to more stable and high-fidelity flow-based generation.
Abstract
Flow Matching (FM) has recently emerged as a principled and efficient alternative to diffusion models. Standard FM encourages the learned velocity field to follow a target direction; however, it may accumulate errors along the trajectory and drive samples off the data manifold, leading to perceptual degradation, especially in lightweight or low-step configurations. To enhance stability and generalization, we extend FM into a balanced attract-repel scheme that provides explicit guidance on both "where to go" and "where not to go." To be formal, we propose \textbf{Velocity Contrastive Regularization (VeCoR)}, a complementary training scheme for flow-based generative modeling that augments the standard FM objective with contrastive, two-sided supervision. VeCoR not only aligns the predicted velocity with a stable reference direction (positive supervision) but also pushes it away from inconsistent, off-manifold directions (negative supervision). This contrastive formulation transforms FM from a purely attractive, one-sided objective into a two-sided training signal, regularizing trajectory evolution and improving perceptual fidelity across datasets and backbones. On ImageNet-1K 256$\times$256, VeCoR yields 22\% and 35\% relative FID reductions on SiT-XL/2 and REPA-SiT-XL/2 backbones, respectively, and achieves further FID gains (32\% relative) on MS-COCO text-to-image generation, demonstrating consistent improvements in stability, convergence, and image quality, particularly in low-step and lightweight settings. Project page: https://p458732.github.io/VeCoR_Project_Page/
