Contrastive Flow Matching
George Stoica, Vivek Ramanujan, Xiang Fan, Ali Farhadi, Ranjay Krishna, Judy Hoffman
TL;DR
This work introduces Contrastive Flow Matching (DeltaFM), a plug-and-play augmentation to conditional diffusion via a contrastive loss that enforces cross-class flow uniqueness, addressing the tendency of conditional flow matching to produce overlapping trajectories. DeltaFM increases discriminability across conditions without extra data or forward passes, achieving up to 9x faster training and 5x fewer denoising steps, while boosting image quality (FID improvements up to ~8.9) on ImageNet-1k and CC3M-based text-to-image tasks. The approach is compatible with Representation Alignment (REPA) and can be combined with classifier-free guidance (CFG) to further enhance performance, with analytical insights linking DeltaFM to CFG and ablations illustrating robust gains across model scales and datasets. The results demonstrate that enforcing conditional flow distinctiveness can markedly improve generation fidelity and efficiency in diffusion-based models, suggesting broader applicability to other conditional generative tasks.
Abstract
Unconditional flow-matching trains diffusion models to transport samples from a source distribution to a target distribution by enforcing that the flows between sample pairs are unique. However, in conditional settings (e.g., class-conditioned models), this uniqueness is no longer guaranteed--flows from different conditions may overlap, leading to more ambiguous generations. We introduce Contrastive Flow Matching, an extension to the flow matching objective that explicitly enforces uniqueness across all conditional flows, enhancing condition separation. Our approach adds a contrastive objective that maximizes dissimilarities between predicted flows from arbitrary sample pairs. We validate Contrastive Flow Matching by conducting extensive experiments across varying model architectures on both class-conditioned (ImageNet-1k) and text-to-image (CC3M) benchmarks. Notably, we find that training models with Contrastive Flow Matching (1) improves training speed by a factor of up to 9x, (2) requires up to 5x fewer de-noising steps and (3) lowers FID by up to 8.9 compared to training the same models with flow matching. We release our code at: https://github.com/gstoica27/DeltaFM.git.
