TReFT: Taming Rectified Flow Models For One-Step Image Translation

Shengqian Li; Ming Gao; Yi Liu; Zuzeng Lin; Feng Wang; Feng Dai

TReFT: Taming Rectified Flow Models For One-Step Image Translation

Shengqian Li, Ming Gao, Yi Liu, Zuzeng Lin, Feng Wang, Feng Dai

TL;DR

This paper tackles the bottleneck of multi-step denoising in Rectified Flow (RF) models for one-step image translation. It introduces TReFT, a simple yet effective finetuning strategy that directly uses the velocity predicted by pretrained DiT/UNet at the final denoising stage, enabling real-time one-step translation. The authors provide theoretical backing (Theorem 1 and Theorem 2) showing that the RF velocity converges to the final clean latent as denoising nears completion, justifying the one-shot output approach. With latent-cycle losses and lightweight architectural tweaks, TReFT achieves competitive or state-of-the-art results on multiple unpaired and paired translation benchmarks while maintaining fast inference, demonstrating practical impact for real-time image translation with pretrained RF models.

Abstract

Rectified Flow (RF) models have advanced high-quality image and video synthesis via optimal transport theory. However, when applied to image-to-image translation, they still depend on costly multi-step denoising, hindering real-time applications. Although the recent adversarial training paradigm, CycleGAN-Turbo, works in pretrained diffusion models for one-step image translation, we find that directly applying it to RF models leads to severe convergence issues. In this paper, we analyze these challenges and propose TReFT, a novel method to Tame Rectified Flow models for one-step image Translation. Unlike previous works, TReFT directly uses the velocity predicted by pretrained DiT or UNet as output-a simple yet effective design that tackles the convergence issues under adversarial training with one-step inference. This design is mainly motivated by a novel observation that, near the end of the denoising process, the velocity predicted by pretrained RF models converges to the vector from origin to the final clean image, a property we further justify through theoretical analysis. When applying TReFT to large pretrained RF models such as SD3.5 and FLUX, we introduce memory-efficient latent cycle-consistency and identity losses during training, as well as lightweight architectural simplifications for faster inference. Pretrained RF models finetuned with TReFT achieve performance comparable to sota methods across multiple image translation datasets while enabling real-time inference.

TReFT: Taming Rectified Flow Models For One-Step Image Translation

TL;DR

Abstract

TReFT: Taming Rectified Flow Models For One-Step Image Translation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)