Table of Contents
Fetching ...

Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening

Jie Huang, Rui Huang, Jinghao Xu, Siran Pen, Yule Duan, Liangjian Deng

TL;DR

This work tackles pansharpening by introducing a frequency-domain fusion strategy that preserves spectral and spatial details. It proposes WFANet, which combines Multi-Frequency Fusion Attention (MFFA) with a Spatial Detail Enhancement Module (SDEM) in a wavelet pyramid to fuse PAN and LRMS features across multiple scales, using a Frequency Attention Triplet with Frequency-Query, Spatial-Key, and Fusion-Value and lossless reconstruction via IDWT. The approach achieves state-of-the-art results on WV3, GF2, and QB datasets in both reduced- and full-resolution settings, with ablations validating the importance of each component, including the DWT-based frequency separation, the attention design, the FAB-based frequency adaptation, and the multi-scale training strategy. The work offers a practical, generalizable framework for high-quality pansharpening with strong potential for real-world remote sensing applications due to its robust frequency-aware fusion and progressive reconstruction capabilities.

Abstract

Pansharpening aims to combine a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to produce a high-resolution multispectral (HRMS) image. Although pansharpening in the frequency domain offers clear advantages, most existing methods either continue to operate solely in the spatial domain or fail to fully exploit the benefits of the frequency domain. To address this issue, we innovatively propose Multi-Frequency Fusion Attention (MFFA), which leverages wavelet transforms to cleanly separate frequencies and enable lossless reconstruction across different frequency domains. Then, we generate Frequency-Query, Spatial-Key, and Fusion-Value based on the physical meanings represented by different features, which enables a more effective capture of specific information in the frequency domain. Additionally, we focus on the preservation of frequency features across different operations. On a broader level, our network employs a wavelet pyramid to progressively fuse information across multiple scales. Compared to previous frequency domain approaches, our network better prevents confusion and loss of different frequency features during the fusion process. Quantitative and qualitative experiments on multiple datasets demonstrate that our method outperforms existing approaches and shows significant generalization capabilities for real-world scenarios.

Wavelet-Assisted Multi-Frequency Attention Network for Pansharpening

TL;DR

This work tackles pansharpening by introducing a frequency-domain fusion strategy that preserves spectral and spatial details. It proposes WFANet, which combines Multi-Frequency Fusion Attention (MFFA) with a Spatial Detail Enhancement Module (SDEM) in a wavelet pyramid to fuse PAN and LRMS features across multiple scales, using a Frequency Attention Triplet with Frequency-Query, Spatial-Key, and Fusion-Value and lossless reconstruction via IDWT. The approach achieves state-of-the-art results on WV3, GF2, and QB datasets in both reduced- and full-resolution settings, with ablations validating the importance of each component, including the DWT-based frequency separation, the attention design, the FAB-based frequency adaptation, and the multi-scale training strategy. The work offers a practical, generalizable framework for high-quality pansharpening with strong potential for real-world remote sensing applications due to its robust frequency-aware fusion and progressive reconstruction capabilities.

Abstract

Pansharpening aims to combine a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to produce a high-resolution multispectral (HRMS) image. Although pansharpening in the frequency domain offers clear advantages, most existing methods either continue to operate solely in the spatial domain or fail to fully exploit the benefits of the frequency domain. To address this issue, we innovatively propose Multi-Frequency Fusion Attention (MFFA), which leverages wavelet transforms to cleanly separate frequencies and enable lossless reconstruction across different frequency domains. Then, we generate Frequency-Query, Spatial-Key, and Fusion-Value based on the physical meanings represented by different features, which enables a more effective capture of specific information in the frequency domain. Additionally, we focus on the preservation of frequency features across different operations. On a broader level, our network employs a wavelet pyramid to progressively fuse information across multiple scales. Compared to previous frequency domain approaches, our network better prevents confusion and loss of different frequency features during the fusion process. Quantitative and qualitative experiments on multiple datasets demonstrate that our method outperforms existing approaches and shows significant generalization capabilities for real-world scenarios.

Paper Structure

This paper contains 31 sections, 17 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: The comparison covers four methods across two dimensions: (a) Convolutional network in the spatial domain, (b) Convolutional network in different frequency domains, (c) Attention mechanism in the spatial domain, and (d) Our proposed method which forms the primary motivation for this paper: 1) utilizing wavelet transforms to process in different frequency domains; 2) designing an attention method with clear physical significance to leverage the advantages of frequency domain processing.
  • Figure 2: (a) DWT decomposes the image into four different frequency components. IDWT is the lossless inverse process of DWT. Multiple applications of DWT produce a multi-scale wavelet pyramid. (b) Simplified illustration of MFFA. Fusion-Value, Spatial-Key, and Frequency-Query are derived from the information indicated by the arrows. These components are then processed through an attention mechanism, enabling the reconstruction of features across different frequencies that integrate both spectral and spatial information.
  • Figure 3: The overall workflow of our WFANet. Our network processes the data using multiple scales (only two scales are illustrated here for simplicity). WFANet consists of two sub-modules: the Multi-Frequency Fusion Attention (MFFA) and the Spatial Detail Enhancement Module (SDEM). The illustration of frequency features is shown on both sides of the figure.
  • Figure 4: The MFFA workflow involves two phases. First, in the FATG phase, the Frequency Attention Triplet with specific physical significance is generated. Then, in the ADFR phase, the obtained Frequency Attention Triplet is processed to reconstruct the features at different frequencies. $Q$, $S$, and $I$ are shown as four colored blocks, representing features from different frequency domains after DWT. The data dimensions are exemplified using the largest scale.
  • Figure 5: Comparison of two network architectures for the SDEM: (a) Frequency Adaptation Block (FAB), which is used in the SDEM. (b) Convolution Block (CB).
  • ...and 8 more figures