Table of Contents
Fetching ...

Frequency-Adaptive Pan-Sharpening with Mixture of Experts

Xuanhua He, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

TL;DR

The paper tackles pan-sharpening by introducing frequency-adaptive processing through a learnable frequency mask and a mixture-of-experts (MOE) framework. The method, FAME-Net, uses a DCT-based frequency mask predictor to partition features into high- and low-frequency streams (HF-MOE and LF-MOE) and an Experts Mixture module to dynamically fuse these streams with PAN/MS information. Key contributions include the first integration of MOE with frequency-domain processing in pan-sharpening, a differentiable Gumbel-Softmax-based mask, and a composite loss with reconstruction, mask, and load terms that balance expert utilization. Experimental results on WV2, GF2, and WV3 demonstrate state-of-the-art performance and strong generalization to real-world scenes, supported by ablations and visualizations of frequency-aware feature maps.

Abstract

Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. In detail, the first leverages the discrete cosine transform to perform frequency separation by predicting the frequency mask. On the basis of generated mask, the second with low-frequency MOE and high-frequency MOE takes account for enabling the effective low-frequency and high-frequency information reconstruction. Followed by, the final fusion module dynamically weights high-frequency and low-frequency MOE knowledge to adapt to remote sensing images with significant content variations. Quantitative and qualitative experiments over multiple datasets demonstrate that our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes. Code will be made publicly at \url{https://github.com/alexhe101/FAME-Net}.

Frequency-Adaptive Pan-Sharpening with Mixture of Experts

TL;DR

The paper tackles pan-sharpening by introducing frequency-adaptive processing through a learnable frequency mask and a mixture-of-experts (MOE) framework. The method, FAME-Net, uses a DCT-based frequency mask predictor to partition features into high- and low-frequency streams (HF-MOE and LF-MOE) and an Experts Mixture module to dynamically fuse these streams with PAN/MS information. Key contributions include the first integration of MOE with frequency-domain processing in pan-sharpening, a differentiable Gumbel-Softmax-based mask, and a composite loss with reconstruction, mask, and load terms that balance expert utilization. Experimental results on WV2, GF2, and WV3 demonstrate state-of-the-art performance and strong generalization to real-world scenes, supported by ablations and visualizations of frequency-aware feature maps.

Abstract

Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. In detail, the first leverages the discrete cosine transform to perform frequency separation by predicting the frequency mask. On the basis of generated mask, the second with low-frequency MOE and high-frequency MOE takes account for enabling the effective low-frequency and high-frequency information reconstruction. Followed by, the final fusion module dynamically weights high-frequency and low-frequency MOE knowledge to adapt to remote sensing images with significant content variations. Quantitative and qualitative experiments over multiple datasets demonstrate that our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes. Code will be made publicly at \url{https://github.com/alexhe101/FAME-Net}.
Paper Structure (17 sections, 12 equations, 6 figures, 3 tables)

This paper contains 17 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Generation process of frequency mask. Firstly, a discrete cosine transform is applied to the image. Then, the upper left part of the DCT spectrum is masked using manually selected thresholds. Finally, the frequency mask is generated through inverse transformation.
  • Figure 2: The overall structure of FAMEnet, which is composed of three main components: Mask predictor, Frequency Experts Module, and Experts Mixture Module.
  • Figure 3: The architecture of the Frequency Experts Module. The frequency mask splits $F_c$ into high-frequency and low-frequency parts, which are processed separately by HF-MOE and LF-MOE.
  • Figure 4: The architecture of Experts Mixture module, which includes the gating mechanism responsible for generating sparse weights based on input features, and the selection of appropriate fusion expert outputs based on the weights.
  • Figure 5: The result of our approach was compared against nine other methods on WorldView-III dataset.
  • ...and 1 more figures