Table of Contents
Fetching ...

CTSR: Controllable Fidelity-Realness Trade-off Distillation for Real-World Image Super Resolution

Runyi Li, Bin Chen, Jian Zhang, Radu Timofte

TL;DR

CTSR addresses the fidelity-realness trade-off in real-world image SR by a dual-teacher distillation framework that first fuses fidelity and realness priors and then enables a continuous trade-off through flow-matching–style distillation guided by diffusion timesteps. The method distills a diffusion-based SR model with a fidelity-prior teacher $\mathcal{T}_f$ and a realness-prior teacher $\mathcal{T}_r$ into a student $\mathcal{S}$ (Stage 1), then refines $\mathcal{S}$ to map diffusion sampling across timesteps within the Stage 1 solution set, yielding controllable outputs via $t \in [0,1]$ (Stage 2). Empirically, CTSR achieves state-of-the-art or competitive performance on real-world SR benchmarks, excelling in realness metrics while maintaining strong fidelity, and it reduces trainable parameters and inference steps. The approach extends to other tasks like low-light enhancement, demonstrating the generality of fidelity-realness distillation and the practicality of diffusion-based controllability for perceptually guided image restoration.

Abstract

Real-world image super-resolution is a critical image processing task, where two key evaluation criteria are the fidelity to the original image and the visual realness of the generated results. Although existing methods based on diffusion models excel in visual realness by leveraging strong priors, they often struggle to achieve an effective balance between fidelity and realness. In our preliminary experiments, we observe that a linear combination of multiple models outperforms individual models, motivating us to harness the strengths of different models for a more effective trade-off. Based on this insight, we propose a distillation-based approach that leverages the geometric decomposition of both fidelity and realness, alongside the performance advantages of multiple teacher models, to strike a more balanced trade-off. Furthermore, we explore the controllability of this trade-off, enabling a flexible and adjustable super-resolution process, which we call CTSR (Controllable Trade-off Super-Resolution). Experiments conducted on several real-world image super-resolution benchmarks demonstrate that our method surpasses existing state-of-the-art approaches, achieving superior performance across both fidelity and realness metrics.

CTSR: Controllable Fidelity-Realness Trade-off Distillation for Real-World Image Super Resolution

TL;DR

CTSR addresses the fidelity-realness trade-off in real-world image SR by a dual-teacher distillation framework that first fuses fidelity and realness priors and then enables a continuous trade-off through flow-matching–style distillation guided by diffusion timesteps. The method distills a diffusion-based SR model with a fidelity-prior teacher and a realness-prior teacher into a student (Stage 1), then refines to map diffusion sampling across timesteps within the Stage 1 solution set, yielding controllable outputs via (Stage 2). Empirically, CTSR achieves state-of-the-art or competitive performance on real-world SR benchmarks, excelling in realness metrics while maintaining strong fidelity, and it reduces trainable parameters and inference steps. The approach extends to other tasks like low-light enhancement, demonstrating the generality of fidelity-realness distillation and the practicality of diffusion-based controllability for perceptually guided image restoration.

Abstract

Real-world image super-resolution is a critical image processing task, where two key evaluation criteria are the fidelity to the original image and the visual realness of the generated results. Although existing methods based on diffusion models excel in visual realness by leveraging strong priors, they often struggle to achieve an effective balance between fidelity and realness. In our preliminary experiments, we observe that a linear combination of multiple models outperforms individual models, motivating us to harness the strengths of different models for a more effective trade-off. Based on this insight, we propose a distillation-based approach that leverages the geometric decomposition of both fidelity and realness, alongside the performance advantages of multiple teacher models, to strike a more balanced trade-off. Furthermore, we explore the controllability of this trade-off, enabling a flexible and adjustable super-resolution process, which we call CTSR (Controllable Trade-off Super-Resolution). Experiments conducted on several real-world image super-resolution benchmarks demonstrate that our method surpasses existing state-of-the-art approaches, achieving superior performance across both fidelity and realness metrics.

Paper Structure

This paper contains 20 sections, 9 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: (a) Controllable trade-off of our proposed CTSR, which could be adjusted freely between better fidelity and better realness. (b) Comparison of current state-of-the-art (SOTA) real-world image SR methods and CTSR on performance and efficiency. Larger bubble indicates longer inference time. The closer the bubble of a method is to the top-right corner of the figure, the better its performance in both fidelity and realness. For our controllable trade-off method, we select six different states and present their performance. The purple curve shows continuously adjusted trade-off points, all of which exhibit performance advantages. (c) Comparison on multiple metrics with current SOTA methods and CTSR. All results are done on DIV2K validation set, 4$\times$ SR with realworld degradation.
  • Figure 2: Illustration for vector decomposition in the image super-resolution process. It shows the simple linear approach, which serves as the motivation of our proposed CTSR.
  • Figure 3: Illustration of our proposed CTSR. (a) At the first stage, we distill student model via two teacher models, one with better fidelity performance, and one with better realness performance. (b) At the second stage, we distill model obtailed from first stage, to a continuous mapping to SR results with different trade-offs between fidelity and realness.
  • Figure 4: Visualized results of evaluation on the RealSR testset, with our proposed CTSR ($t=0.0$) and compared methods.
  • Figure 5: Visualized calculation process of $\mathcal{L}_{fl}$.
  • ...and 1 more figures