Table of Contents
Fetching ...

FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution

Aro Kim, Myeongjin Jang, Chaewon Moon, Youngjin Shin, Jinwoo Jeong, Sang-hyo Park

TL;DR

FiDeSR is proposed, a high-fidelity and detail-preserving one-step diffusion super-resolution framework that achieves superior real-world SR performance compared to existing diffusion-based methods, producing outputs with both high perceptual quality and faithful content restoration.

Abstract

Diffusion-based approaches have recently driven remarkable progress in real-world image super-resolution (SR). However, existing methods still struggle to simultaneously preserve fine details and ensure high-fidelity reconstruction, often resulting in suboptimal visual quality. In this paper, we propose FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework. During training, we introduce a detail-aware weighting strategy that adaptively emphasizes regions where the model exhibits higher prediction errors. During inference, low- and high-frequency adaptive enhancers further refine the reconstruction without requiring model retraining, enabling flexible enhancement control. To further improve the reconstruction accuracy, FiDeSR incorporates a residual-in-residual noise refinement, which corrects prediction errors in the diffusion noise and enhances fine detail recovery. FiDeSR achieves superior real-world SR performance compared to existing diffusion-based methods, producing outputs with both high perceptual quality and faithful content restoration. The source code will be released at: https://github.com/Ar0Kim/FiDeSR.

FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution

TL;DR

FiDeSR is proposed, a high-fidelity and detail-preserving one-step diffusion super-resolution framework that achieves superior real-world SR performance compared to existing diffusion-based methods, producing outputs with both high perceptual quality and faithful content restoration.

Abstract

Diffusion-based approaches have recently driven remarkable progress in real-world image super-resolution (SR). However, existing methods still struggle to simultaneously preserve fine details and ensure high-fidelity reconstruction, often resulting in suboptimal visual quality. In this paper, we propose FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework. During training, we introduce a detail-aware weighting strategy that adaptively emphasizes regions where the model exhibits higher prediction errors. During inference, low- and high-frequency adaptive enhancers further refine the reconstruction without requiring model retraining, enabling flexible enhancement control. To further improve the reconstruction accuracy, FiDeSR incorporates a residual-in-residual noise refinement, which corrects prediction errors in the diffusion noise and enhances fine detail recovery. FiDeSR achieves superior real-world SR performance compared to existing diffusion-based methods, producing outputs with both high perceptual quality and faithful content restoration. The source code will be released at: https://github.com/Ar0Kim/FiDeSR.
Paper Structure (34 sections, 16 equations, 12 figures, 9 tables, 1 algorithm)

This paper contains 34 sections, 16 equations, 12 figures, 9 tables, 1 algorithm.

Figures (12)

  • Figure 1: Performance comparison among Real-ISR methods on three perceptual–fidelity metric pairs: PSNR vs. MANIQA (left), SSIM vs. MANIQA (middle), and LPIPS vs. MANIQA (right). Higher MANIQA, PSNR, and SSIM values and lower LPIPS values indicate better performance. FiDeSR achieves superior perceptual quality while maintaining competitive fidelity across all three metric pairs. All methods are evaluated on the DRealSR dataset.
  • Figure 2: Example failure cases of diffusion-based Real-ISR methods. (b) AddSR introduces structural distortion and low-frequency inconsistency. (c) OSEDiff loses high-frequency details, producing over-smoothed texture. (d) PiSA-SR generates excessive details. In contrast, (e) our method achieves both high fidelity and detail-preserving.
  • Figure 3: Overall framework of FiDeSR. (a) Training process: LQ image $x_L$ is encoded into a latent $z_L$, and the Diffusion Network predicts a coarse residual $r$, which is refined by LRRB to create refined latent $z_r$. The training loss is guided by the DAW module to emphasize fine structural details, and the refined latent ${z_r}$ is decoded by the VAE Decoder. (b) Inference process: Following the single-step diffusion process, the LQ latent $z_L$ is processed by the Diffusion Network to predict a residual, which is then refined by the LRRB. Refined $z_r$ is enhanced by frequency components through the LFIM, and finally decoded by the VAE Decoder to produce the Super-Resolution (SR) image $x_{SR}$.
  • Figure 4: Comparison of the pipeline among ESRGAN, PiSA-SR, and our FiDeSR. (a) ESRGAN employs RRDB-based restoration in pixel space. (b) PiSA-SR uses a one-step diffusion network to predict a global residual in the latent space. (c) Our FiDeSR introduces latent residual refinement block (LRRB) that progressively refines the residual.
  • Figure 5: Qualitative comparisons with state-of-the-art DM-based SR methods.
  • ...and 7 more figures