Table of Contents
Fetching ...

TTRD3: Texture Transfer Residual Denoising Dual Diffusion Model for Remote Sensing Image Super-Resolution

Yide Liu, Haijiang Sun, Xiaowen Zhang, Qiaoyuan Liu, Zhouchang Chen, Chongzhuo Xiao

TL;DR

TTRD3 introduces a texture-aware RSISR framework that jointly optimizes geometric fidelity and perceptual realism. It combines a Residual Denoising Dual Diffusion Model with a Multi-scale Feature Aggregation Block and Sparse Texture Transfer Guidance to exploit HR texture priors from similar scenes without strict alignment. The method achieves superior LPIPS and FID while maintaining competitive PSNR/SSIM on AID and RSD46, demonstrating robustness across diverse RS scenes. Practical variants and ablations highlight favorable trade-offs between reconstruction quality and efficiency, suggesting strong potential for real-world RS super-resolution tasks. The work also points to future directions in blind SR and degradation-aware adaptations.

Abstract

Remote Sensing Image Super-Resolution (RSISR) reconstructs high-resolution (HR) remote sensing images from low-resolution inputs to support fine-grained ground object interpretation. Existing methods face three key challenges: (1) Difficulty in extracting multi-scale features from spatially heterogeneous RS scenes, (2) Limited prior information causing semantic inconsistency in reconstructions, and (3) Trade-off imbalance between geometric accuracy and visual quality. To address these issues, we propose the Texture Transfer Residual Denoising Dual Diffusion Model (TTRD3) with three innovations: First, a Multi-scale Feature Aggregation Block (MFAB) employing parallel heterogeneous convolutional kernels for multi-scale feature extraction. Second, a Sparse Texture Transfer Guidance (STTG) module that transfers HR texture priors from reference images of similar scenes. Third, a Residual Denoising Dual Diffusion Model (RDDM) framework combining residual diffusion for deterministic reconstruction and noise diffusion for diverse generation. Experiments on multi-source RS datasets demonstrate TTRD3's superiority over state-of-the-art methods, achieving 1.43% LPIPS improvement and 3.67% FID enhancement compared to best-performing baselines. Code/model: https://github.com/LED-666/TTRD3.

TTRD3: Texture Transfer Residual Denoising Dual Diffusion Model for Remote Sensing Image Super-Resolution

TL;DR

TTRD3 introduces a texture-aware RSISR framework that jointly optimizes geometric fidelity and perceptual realism. It combines a Residual Denoising Dual Diffusion Model with a Multi-scale Feature Aggregation Block and Sparse Texture Transfer Guidance to exploit HR texture priors from similar scenes without strict alignment. The method achieves superior LPIPS and FID while maintaining competitive PSNR/SSIM on AID and RSD46, demonstrating robustness across diverse RS scenes. Practical variants and ablations highlight favorable trade-offs between reconstruction quality and efficiency, suggesting strong potential for real-world RS super-resolution tasks. The work also points to future directions in blind SR and degradation-aware adaptations.

Abstract

Remote Sensing Image Super-Resolution (RSISR) reconstructs high-resolution (HR) remote sensing images from low-resolution inputs to support fine-grained ground object interpretation. Existing methods face three key challenges: (1) Difficulty in extracting multi-scale features from spatially heterogeneous RS scenes, (2) Limited prior information causing semantic inconsistency in reconstructions, and (3) Trade-off imbalance between geometric accuracy and visual quality. To address these issues, we propose the Texture Transfer Residual Denoising Dual Diffusion Model (TTRD3) with three innovations: First, a Multi-scale Feature Aggregation Block (MFAB) employing parallel heterogeneous convolutional kernels for multi-scale feature extraction. Second, a Sparse Texture Transfer Guidance (STTG) module that transfers HR texture priors from reference images of similar scenes. Third, a Residual Denoising Dual Diffusion Model (RDDM) framework combining residual diffusion for deterministic reconstruction and noise diffusion for diverse generation. Experiments on multi-source RS datasets demonstrate TTRD3's superiority over state-of-the-art methods, achieving 1.43% LPIPS improvement and 3.67% FID enhancement compared to best-performing baselines. Code/model: https://github.com/LED-666/TTRD3.

Paper Structure

This paper contains 41 sections, 25 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: The TTRD3 framework extracts structurally similar high-frequency spatial texture information from HR reference images, thereby guiding the RDDM to generate realistic high-frequency details.
  • Figure 2: illustrates the forward process and reverse inference process of the RDDM, comprising two components: residual diffusion and noise diffusion. Here, $I_0$ represents the HR image, $I_T$ denotes the dual-diffusion degraded image, $I_{\text{in}}$ corresponds to the residual degraded image (LR image), $I_{\text{SR}}$ denotes the SR image, and $\varepsilon$ signifies the gaussian noise.
  • Figure 3: The overall framework of the proposed TTRD3 is illustrated. Here, LR↑ denotes the 4× upsampled LR image, Ref↓↑ represents the degraded reference image obtained by downsampling and subsequent upsampling, and Ref is the high-quality reference image. These inputs are processed by the MFAM to extract multi-scale features at three levels. The STTG then generates multi-scale sparse texture guidance maps from these features. Finally, the Guid Decoder integrates the texture guidance information into the residual denoising U-Net network for high-fidelity reconstruction.
  • Figure 4: illustrates the Multi-scale Feature Aggregation Block(MFAB)
  • Figure 5: The structural diagram of CBAM. (a) CAM, (b)SAM, (c) overall process of CBAM.
  • ...and 11 more figures