Table of Contents
Fetching ...

Bi^2MAC: Bimodal Bi-Adaptive Mask-Aware Convolution for Remote Sensing Pansharpening

Xianghong Xiao, Zeyu Xia, Zhou Fei, Jinliang Xiao, Haorui Chen, Liangjian Deng

TL;DR

This work tackles pansharpening for remote sensing by addressing regional heterogeneity in feature representations. It introduces Bi^2MAC, a bimodal, bi-adaptive mask-aware convolution that uses a Content-Aware Mask Generator and dual-path Mask-Aware Convolution to allocate computation between redundant and heterogeneous regions, aided by independent low-rank kernels. The Bi^2MAC module is integrated into a UNet-style Bi^2MANet, achieving state-of-the-art results across WV3, QB, and GF2 benchmarks while reducing computational cost and parameter counts. Comprehensive ablations, efficiency analyses, and mask visualizations validate the design, demonstrating practical impact as a plug-and-play adaptive operator for high-quality, spectrally faithful pansharpening.

Abstract

Pansharpening aims to fuse a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to generate a high-resolution multispectral image (HRMS). Conventional deep learning-based methods are inherently limited in their ability to adapt to regional heterogeneity within feature representations. Although various adaptive convolution methods have been proposed to address this limitation, they often suffer from excessive computational costs and a limited ability to capture heterogeneous regions in remote sensing images effectively. To overcome these challenges, we propose Bimodal Bi-Adaptive Mask-Aware Convolution (Bi^2MAC), which effectively exploits information from different types of regions while intelligently allocating computational resources. Specifically, we design a lightweight module to generate both soft and hard masks, which are used to modulate the input features preliminarily and to guide different types of regions into separate processing branches, respectively. Redundant features are directed to a compact branch for low-cost global processing. In contrast, heterogeneous features are routed to a focused branch that invests more computational resources for fine-grained modeling. Extensive experiments on multiple benchmark datasets demonstrate that Bi^2MAC achieves state-of-the-art (SOTA) performance while requiring substantially lower training time and parameter counts, and the minimal computational cost among adaptive convolution models.

Bi^2MAC: Bimodal Bi-Adaptive Mask-Aware Convolution for Remote Sensing Pansharpening

TL;DR

This work tackles pansharpening for remote sensing by addressing regional heterogeneity in feature representations. It introduces Bi^2MAC, a bimodal, bi-adaptive mask-aware convolution that uses a Content-Aware Mask Generator and dual-path Mask-Aware Convolution to allocate computation between redundant and heterogeneous regions, aided by independent low-rank kernels. The Bi^2MAC module is integrated into a UNet-style Bi^2MANet, achieving state-of-the-art results across WV3, QB, and GF2 benchmarks while reducing computational cost and parameter counts. Comprehensive ablations, efficiency analyses, and mask visualizations validate the design, demonstrating practical impact as a plug-and-play adaptive operator for high-quality, spectrally faithful pansharpening.

Abstract

Pansharpening aims to fuse a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to generate a high-resolution multispectral image (HRMS). Conventional deep learning-based methods are inherently limited in their ability to adapt to regional heterogeneity within feature representations. Although various adaptive convolution methods have been proposed to address this limitation, they often suffer from excessive computational costs and a limited ability to capture heterogeneous regions in remote sensing images effectively. To overcome these challenges, we propose Bimodal Bi-Adaptive Mask-Aware Convolution (Bi^2MAC), which effectively exploits information from different types of regions while intelligently allocating computational resources. Specifically, we design a lightweight module to generate both soft and hard masks, which are used to modulate the input features preliminarily and to guide different types of regions into separate processing branches, respectively. Redundant features are directed to a compact branch for low-cost global processing. In contrast, heterogeneous features are routed to a focused branch that invests more computational resources for fine-grained modeling. Extensive experiments on multiple benchmark datasets demonstrate that Bi^2MAC achieves state-of-the-art (SOTA) performance while requiring substantially lower training time and parameter counts, and the minimal computational cost among adaptive convolution models.

Paper Structure

This paper contains 33 sections, 14 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: (a) Conventional deep learning–based pansharpening methods. (b) The proposed Bi$^{2}$MAC adaptively assigns pixels to different branches via a content-aware mask, enabling specialized feature modeling. (c) Within each branch, Bi$^{2}$MAC generates adaptive convolution kernels based on pixel statistics, dynamically adjusting receptive fields and weight distributions to capture regional heterogeneity better.
  • Figure 2: We analyze image patches from typical redundant (e.g., rooftops, water) and complex (e.g., edges, textures) regions. SVD and Fourier analyses validating our core motivation. Redundant regions are shown to be low-rank and dominated by low frequencies, while complex regions are the opposite, thereby motivating our dual-path Bi$^{2}$MAC architecture.
  • Figure 3: Overview of the proposed Bi$^{2}$MAC model.The model consists of two key components: a Content-Aware Mask Generator (CAMG) and a Mask-Aware Bimodal Adaptive Convolution (MABiC). The CAMG first produces hard masks, which guide each pixel into different branches for adaptive processing. The figure illustrates the overall data flow, as well as the generation of masks, adaptive kernels, and low-rank kernels.
  • Figure 4: Visual Fusion image and Error maps on GF2 dataset (reduced data). For the error maps, blue indicates low error.
  • Figure 5: Visualization of the soft mask flat ($SM_F$) heatmaps and hard mask ($HM$) maps generated by Bi$^{2}$MAC at different training stages and network depths. Scale denotes the downsampling ratio in the U-Net. The color intensity in $SM_F$ reflects the model’s attention level to each region, while the black-and-white pattern in $HM$ indicates the branch assignment for pixel-wise processing.
  • ...and 2 more figures