COT Flow: Learning Optimal-Transport Image Sampling and Editing by Contrastive Pairs
Xinrui Zu, Qian Tao
TL;DR
This paper introduces Contrastive Optimal Transport Flow (COT Flow), a method that unifies optimal transport with diffusion/flow-based generative models to achieve fast, high-quality sampling from arbitrary priors and enhanced zero-shot editing. Central to the approach are COT Pairs, a training scheme that leverages entropic OT trajectories and contrastive-like encodings, and the COT Editor, which enables flexible editing via dual-channel inputs and self-augmentation. The method addresses the generative learning trilemma by directly learning the transport flow between unpaired sources, enabling one-step sampling and competitive unpaired I2I translation quality, while offering new zero-shot editing capabilities such as COT composition and shape-texture coupling. Empirical results demonstrate strong performance on standard unpaired I2I tasks and showcase versatile editing scenarios, with ablations confirming the benefits of the proposed COT Pair design and training strategy. Overall, COT Flow provides a practical, OT-grounded pathway to fast and flexible generative modeling and editing.
Abstract
Diffusion models have demonstrated strong performance in sampling and editing multi-modal data with high generation quality, yet they suffer from the iterative generation process which is computationally expensive and slow. In addition, most methods are constrained to generate data from Gaussian noise, which limits their sampling and editing flexibility. To overcome both disadvantages, we present Contrastive Optimal Transport Flow (COT Flow), a new method that achieves fast and high-quality generation with improved zero-shot editing flexibility compared to previous diffusion models. Benefiting from optimal transport (OT), our method has no limitation on the prior distribution, enabling unpaired image-to-image (I2I) translation and doubling the editable space (at both the start and end of the trajectory) compared to other zero-shot editing methods. In terms of quality, COT Flow can generate competitive results in merely one step compared to previous state-of-the-art unpaired image-to-image (I2I) translation methods. To highlight the advantages of COT Flow through the introduction of OT, we introduce the COT Editor to perform user-guided editing with excellent flexibility and quality. The code will be released at https://github.com/zuxinrui/cot_flow.
