Table of Contents
Fetching ...

HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution

Chao Yang, Boqian Zhang, Jinghao Xu, Guang Jiang

TL;DR

HDW-SR introduces a high-frequency guided diffusion framework for image super-resolution that diffuses only the residual and leverages wavelet-based downsampling to preserve and utilize high-frequency details. The method combines a lossless wavelet sampling-based HDW-Net (HE-Net and HA-Net) with a DFA-based encoder and a Dynamic Thresholding Block to provide sparse, adaptive high-frequency guidance during diffusion. Experimental results on synthetic and real-world datasets show competitive quantitative performance and notably improved detail fidelity, outperforming several diffusion-based and GAN-based baselines in no-reference quality. The approach also supports flexible multi-level wavelet decompositions, offering a practical path to sharper textures and edges in SR.

Abstract

Diffusion-based methods have shown great promise in single image super-resolution (SISR); however, existing approaches often produce blurred fine details due to insufficient guidance in the high-frequency domain. To address this issue, we propose a High-Frequency Guided Diffusion Network based on Wavelet Decomposition (HDW-SR), which replaces the conventional U-Net backbone in diffusion frameworks. Specifically, we perform diffusion only on the residual map, allowing the network to focus more effectively on high-frequency information restoration. We then introduce wavelet-based downsampling in place of standard CNN downsampling to achieve multi-scale frequency decomposition, enabling sparse cross-attention between the high-frequency subbands of the pre-super-resolved image and the low-frequency subbands of the diffused image for explicit high-frequency guidance. Moreover, a Dynamic Thresholding Block (DTB) is designed to refine high-frequency selection during the sparse attention process. During upsampling, the invertibility of the wavelet transform ensures low-loss feature reconstruction. Experiments on both synthetic and real-world datasets demonstrate that HDW-SR achieves competitive super-resolution performance, excelling particularly in recovering fine-grained image details. The code will be available after acceptance.

HDW-SR: High-Frequency Guided Diffusion Model based on Wavelet Decomposition for Image Super-Resolution

TL;DR

HDW-SR introduces a high-frequency guided diffusion framework for image super-resolution that diffuses only the residual and leverages wavelet-based downsampling to preserve and utilize high-frequency details. The method combines a lossless wavelet sampling-based HDW-Net (HE-Net and HA-Net) with a DFA-based encoder and a Dynamic Thresholding Block to provide sparse, adaptive high-frequency guidance during diffusion. Experimental results on synthetic and real-world datasets show competitive quantitative performance and notably improved detail fidelity, outperforming several diffusion-based and GAN-based baselines in no-reference quality. The approach also supports flexible multi-level wavelet decompositions, offering a practical path to sharper textures and edges in SR.

Abstract

Diffusion-based methods have shown great promise in single image super-resolution (SISR); however, existing approaches often produce blurred fine details due to insufficient guidance in the high-frequency domain. To address this issue, we propose a High-Frequency Guided Diffusion Network based on Wavelet Decomposition (HDW-SR), which replaces the conventional U-Net backbone in diffusion frameworks. Specifically, we perform diffusion only on the residual map, allowing the network to focus more effectively on high-frequency information restoration. We then introduce wavelet-based downsampling in place of standard CNN downsampling to achieve multi-scale frequency decomposition, enabling sparse cross-attention between the high-frequency subbands of the pre-super-resolved image and the low-frequency subbands of the diffused image for explicit high-frequency guidance. Moreover, a Dynamic Thresholding Block (DTB) is designed to refine high-frequency selection during the sparse attention process. During upsampling, the invertibility of the wavelet transform ensures low-loss feature reconstruction. Experiments on both synthetic and real-world datasets demonstrate that HDW-SR achieves competitive super-resolution performance, excelling particularly in recovering fine-grained image details. The code will be available after acceptance.

Paper Structure

This paper contains 23 sections, 8 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Existing methods achieve good performance in restoring the overall structure, but still have deficiencies in detail reconstruction (e.g. the lifebuoy on the boat).
  • Figure 2: Overview of Prior-based Residual Diffusion Network.
  • Figure 3: Overview of HDW-Net. HE-Net extracts the high-frequency component of $\tilde{X}$ and feeds it into the HA-Net on the right. The HA-Net encoder then computes cross-attention between high- and low-frequency components to extract detailed features, completing the diffusion process.
  • Figure 4: Overview of the DFA‑based encoder: DFA performs sparse cross‑attention between low‑ and high‑frequency wavelet components, while DTB dynamically selects K via inter‑class and intra‑class variances, supplanting Top‑K.
  • Figure 5: Visual comparisons of different DM-based SR methods on DIV2K.
  • ...and 1 more figures