Table of Contents
Fetching ...

One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation

Jianze Li, Jiezhang Cao, Yong Guo, Wenbo Li, Yulun Zhang

TL;DR

Diffusion-based Real-ISR methods offer high realism but suffer from expensive multi-step inference. FluxSR presents a one-step Real-ISR framework built on FLUX.1-dev and introduces Flow Trajectory Distillation to transfer the multi-step SR flow from a large T2I model while keeping the teacher flow fixed, enabling offline distillation with a single-step generator. The approach is augmented with TV-LPIPS and Attention Diversification Loss to mitigate high-frequency artifacts, and a large-model friendly training strategy that avoids online teacher inference. Empirical results show FluxSR achieves state-of-the-art performance among one-step methods and competitive realism against multi-step baselines on real-world datasets, at the cost of higher parameter count and compute.

Abstract

Diffusion models (DMs) have significantly advanced the development of real-world image super-resolution (Real-ISR), but the computational cost of multi-step diffusion models limits their application. One-step diffusion models generate high-quality images in a one sampling step, greatly reducing computational overhead and inference latency. However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. We use the state-of-the-art diffusion model FLUX.1-dev as both the teacher model and the base model. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss and introduce Attention Diversification Loss (ADL) as a regularization term to reduce token similarity in transformer, thereby eliminating high-frequency artifacts. Comprehensive experiments demonstrate that our method outperforms existing one-step diffusion-based Real-ISR methods. The code and model will be released at https://github.com/JianzeLi-114/FluxSR.

One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation

TL;DR

Diffusion-based Real-ISR methods offer high realism but suffer from expensive multi-step inference. FluxSR presents a one-step Real-ISR framework built on FLUX.1-dev and introduces Flow Trajectory Distillation to transfer the multi-step SR flow from a large T2I model while keeping the teacher flow fixed, enabling offline distillation with a single-step generator. The approach is augmented with TV-LPIPS and Attention Diversification Loss to mitigate high-frequency artifacts, and a large-model friendly training strategy that avoids online teacher inference. Empirical results show FluxSR achieves state-of-the-art performance among one-step methods and competitive realism against multi-step baselines on real-world datasets, at the cost of higher parameter count and compute.

Abstract

Diffusion models (DMs) have significantly advanced the development of real-world image super-resolution (Real-ISR), but the computational cost of multi-step diffusion models limits their application. One-step diffusion models generate high-quality images in a one sampling step, greatly reducing computational overhead and inference latency. However, most existing one-step diffusion methods are constrained by the performance of the teacher model, where poor teacher performance results in image artifacts. To address this limitation, we propose FluxSR, a novel one-step diffusion Real-ISR technique based on flow matching models. We use the state-of-the-art diffusion model FLUX.1-dev as both the teacher model and the base model. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss and introduce Attention Diversification Loss (ADL) as a regularization term to reduce token similarity in transformer, thereby eliminating high-frequency artifacts. Comprehensive experiments demonstrate that our method outperforms existing one-step diffusion-based Real-ISR methods. The code and model will be released at https://github.com/JianzeLi-114/FluxSR.

Paper Structure

This paper contains 16 sections, 23 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Visual comparisons of different Real-ISR methods. Top: Comparison between FluxSR and state-of-the-art one-step diffusion methods. Bottom: Comparison between FluxSR and state-of-the-art multi-step diffusion methods. Our proposed FluxSR generates more realistic images with high-frequency details.
  • Figure 2: Difference of exiting methods and our Flow Trajectory Distillation. (Left) Based on the pre-trained models from noise $\epsilon$ to images $x_0$, existing one-step diffusion models fine-tune the model from LR images to HR images $x_H$. It may lead to a distribution shift between the real data distribution (blue) and the generated distribution (orange). (Right) To bridge the mapping from LR image distribution (green) to real data distribution, we propose Flow Trajectory Distillation. We constrain $u_t^{SR}$ using the other two trajectories in the triangle, ensuring that the real data distribution (blue) does not shift.
  • Figure 3: Training framework of FluxSR. (Top) Multi-step inference process of the pre-trained FLUX model. (Middle) Training strategy of FluxSR. (Bottom) Computation process of FTD. We distill a one-step super-resolution model from the multi-step FLUX model, without the need for the teacher model to be involved online during training.
  • Figure 4: Examples of Pronounced Periodic Artifacts During Training. Left: 256-pixel image with noticeable periodic high-frequency artifacts. Right: 64-pixel zoomed-in region, showing artifacts with four cycles in both width and height.
  • Figure 5: Visual comparisons ($\times$4) on Real-ISR task.