Table of Contents
Fetching ...

QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution

Donglin Yang, Paul Vicol, Xiaojuan Qi, Renjie Liao, Xiaofan Zhang

TL;DR

The paper tackles the inefficiency of uniform diffusion-based super-resolution by introducing QDM, which leverages a quadtree-derived prior to locate detail-rich regions and guides a mask-based sparse diffusion. A dual-stream transformer architecture enables region-adaptive computation by processing global context in an upstream stream and refining detail-rich regions in a downstream stream. Across real-world and medical CT SR benchmarks, QDM achieves competitive or superior SR quality while substantially reducing memory usage and compute. The approach offers practical benefits for resource-constrained deployment and suggests potential extensions to latent-space SR and broader diffusion-based restoration tasks.

Abstract

Deep learning-based super-resolution (SR) methods often perform pixel-wise computations uniformly across entire images, even in homogeneous regions where high-resolution refinement is redundant. We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework that leverages a quadtree structure to selectively enhance detail-rich regions while reducing computations in homogeneous areas. By guiding the diffusion with a quadtree derived from the low-quality input, QDM identifies key regions-represented by leaf nodes-where fine detail is essential and applies minimal refinement elsewhere. This mask-guided, two-stream architecture adaptively balances quality and efficiency, producing high-fidelity outputs with low computational redundancy. Experiments demonstrate QDM's effectiveness in high-resolution SR tasks across diverse image types, particularly in medical imaging (e.g., CT scans), where large homogeneous regions are prevalent. Furthermore, QDM outperforms or is comparable to state-of-the-art SR methods on standard benchmarks while significantly reducing computational costs, highlighting its efficiency and suitability for resource-limited environments. Our code is available at https://github.com/linYDTHU/QDM.

QDM: Quadtree-Based Region-Adaptive Sparse Diffusion Models for Efficient Image Super-Resolution

TL;DR

The paper tackles the inefficiency of uniform diffusion-based super-resolution by introducing QDM, which leverages a quadtree-derived prior to locate detail-rich regions and guides a mask-based sparse diffusion. A dual-stream transformer architecture enables region-adaptive computation by processing global context in an upstream stream and refining detail-rich regions in a downstream stream. Across real-world and medical CT SR benchmarks, QDM achieves competitive or superior SR quality while substantially reducing memory usage and compute. The approach offers practical benefits for resource-constrained deployment and suggests potential extensions to latent-space SR and broader diffusion-based restoration tasks.

Abstract

Deep learning-based super-resolution (SR) methods often perform pixel-wise computations uniformly across entire images, even in homogeneous regions where high-resolution refinement is redundant. We propose the Quadtree Diffusion Model (QDM), a region-adaptive diffusion framework that leverages a quadtree structure to selectively enhance detail-rich regions while reducing computations in homogeneous areas. By guiding the diffusion with a quadtree derived from the low-quality input, QDM identifies key regions-represented by leaf nodes-where fine detail is essential and applies minimal refinement elsewhere. This mask-guided, two-stream architecture adaptively balances quality and efficiency, producing high-fidelity outputs with low computational redundancy. Experiments demonstrate QDM's effectiveness in high-resolution SR tasks across diverse image types, particularly in medical imaging (e.g., CT scans), where large homogeneous regions are prevalent. Furthermore, QDM outperforms or is comparable to state-of-the-art SR methods on standard benchmarks while significantly reducing computational costs, highlighting its efficiency and suitability for resource-limited environments. Our code is available at https://github.com/linYDTHU/QDM.

Paper Structure

This paper contains 15 sections, 8 equations, 9 figures, 8 tables, 1 algorithm.

Figures (9)

  • Figure 1: The original high-resolution (HR) natural images collected from DIV2K agustsson2017ntire and medical images from SegRap2023 luo2025segrap2023, are downsampled to lower resolutions. The difference heatmaps reveal that only sparse detail-rich regions—such as edges and textures—require significant refinement, while large homogeneous areas remain largely unaffected.
  • Figure 2: Dual-Stream Architecture Overview. Our transformer-based model employs two complementary processing streams guided by a quadtree mask for input-adaptive computation. The upstream (top) extracts global context via Diffusion Transformer (DiT) blocks operating on patches with size 8$\times$8, while the downstream (bottom) refines patches with size 2$\times$2 exclusively in mask-selected regions. Mask $M$ dynamically controls computational overhead by routing computationally intensive processing only to detail-rich areas. Selected tokens are partitioned into parallelizable chunks, each processed independently through cross-attention blocks. Final predictions combine the coarse upstream outputs with the refined downstream results.
  • Figure 3: Examples of Mask-Guided Diffusion Process. Left: Low-quality inputs with quadtree partitions (thresholds $s=0.06$ for top, $s=0.00$ for bottom). Right: Sampling trajectories of our QDM framework. Our diffusion process selectively adds noise to detail-rich regions (quadtree leaf nodes, smallest blocks in the left images that are often sparse) while preserving homogeneous areas.
  • Figure 4: Visual comparison of different methods on real-world images and medical CT datasets. Zoom in for finer details.
  • Figure 5: Comparison of QDM, DiT peebles2023scalable, U-Net ronneberger2015u, and a Swin Attention-enhanced U-Net variant liu2021swin with high-resolution ($1024\times1024$) inputs. The left panel shows peak memory usage, while the right panel depicts inference speed across evaluated models. All metrics were measured on an NVIDIA A100 80G GPU. For QDM, the evaluation was conducted under the most resource-intensive condition, using a full mask without adaptive computation and processing 64 chunks in parallel.
  • ...and 4 more figures