Table of Contents
Fetching ...

Consistency Trajectory Matching for One-Step Generative Super-Resolution

Weiyi You, Mingyang Zhang, Leheng Zhang, Xingyu Zhou, Kexuan Shi, Shuhang Gu

TL;DR

This work introduces Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free framework that enables one-step, photo-realistic SR by learning a PF-ODE trajectory from noisy LR to HR through Consistency Training (CT) and refining realism with Distribution Trajectory Matching (DTM). CT directly maps trajectory points along the PF-ODE to the final HR, avoiding pre-trained diffusion teachers, while DTM aligns the SR trajectory with the natural-image distribution at the distribution level. The method achieves competitive or superior perceptual quality compared to diffusion-based baselines on both synthetic and real-world datasets, with markedly lower inference latency. CTMSR thus provides a scalable, backbone-independent, and efficient solution for high-quality one-step super-resolution.

Abstract

Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency.

Consistency Trajectory Matching for One-Step Generative Super-Resolution

TL;DR

This work introduces Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free framework that enables one-step, photo-realistic SR by learning a PF-ODE trajectory from noisy LR to HR through Consistency Training (CT) and refining realism with Distribution Trajectory Matching (DTM). CT directly maps trajectory points along the PF-ODE to the final HR, avoiding pre-trained diffusion teachers, while DTM aligns the SR trajectory with the natural-image distribution at the distribution level. The method achieves competitive or superior perceptual quality compared to diffusion-based baselines on both synthetic and real-world datasets, with markedly lower inference latency. CTMSR thus provides a scalable, backbone-independent, and efficient solution for high-quality one-step super-resolution.

Abstract

Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency.

Paper Structure

This paper contains 15 sections, 25 equations, 13 figures, 8 tables, 3 algorithms.

Figures (13)

  • Figure 1: An illustrative comparison of vanilla distillation and our proposed Consistency Trajectory Matching for SR. In contrast to vanilla distillation, Consistency Training directly learns the deterministic mapping from noisy LR distribution to the natural image distribution to achieve one-step inference and DTM is proposed to further enhance the realism of SR results.
  • Figure 2: The pipeline of the proposed CTMSR. We first employ CT loss to train our CTMSR until convergence to get a pre-trained CTMSR ($f_{\theta'}$) with parameters frozen. As our pre-trained CTMSR is able to construct the PF-ODE trajectory from one distribution to another, we feed $\hat{x}_{t'}$ and $x_{t'}$ into the pre-trained CTMSR to get the trajectories of fake ODE and real ODE respectively, namely $x_\text{fake}$ and $x_\text{real}$. Then we calculate the $\nabla_\theta\mathcal{L}_{\mathrm{DTM}}$ that matches the trajectories to penalize the distribution discrepancy between our SR results and the real images in a trajectory level. With the calculated $\nabla_\theta\mathcal{L}_{\mathrm{DTM}}$ backpropagated to our training CTMSR, the realism of SR results produced by our model will be further enhanced.
  • Figure 3: Visual comparisons of different methods on two synthetic examples of the ImageNet-Test dataset.
  • Figure 4: Visual comparisons of different methods on two examples of real-world datasets. Please zoom in for more details.
  • Figure 5: A visual comparison between the impact of DTM and SDS. It can be observed that DTM restores more details and produces fewer artifacts compared to the other two methods.
  • ...and 8 more figures