Table of Contents
Fetching ...

Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

Yule Duan, Xiao Wu, Haoyu Deng, Liang-Jian Deng

TL;DR

CANConv introduces a content-adaptive non-local convolution for remote sensing pansharpening by clustering image regions and applying cluster-wise adaptive kernels. SRP identifies non-local self-similarity by clustering unfolded neighborhood features, while PWAC generates per-cluster kernels and biases from centroids, enabling efficient, region-aware information propagation via Y_{xy} = p_{xy} ⊗ f_k(c_{I_{xy}}) + f_b(c_{I_{xy}}). Built on this module, CANNet adopts a U-Net–style architecture to exploit multi-scale self-similarity and deliver state-of-the-art pansharpening performance on WV3, QB, and GF2 datasets, backed by thorough ablations, K-Means vs KNN analysis, and backprop considerations. The work provides extensive experimental validation, architectural insights, and open-source code, making a practical and scalable advance for remote-sensing image fusion.

Abstract

Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolution (CANConv), a novel method tailored for remote sensing image pansharpening. Specifically, CANConv employs adaptive convolution, ensuring spatial adaptability, and incorporates non-local self-similarity through the similarity relationship partition (SRP) and the partition-wise adaptive convolution (PWAC) sub-modules. Furthermore, we also propose a corresponding network architecture, called CANNet, which mainly utilizes the multi-scale self-similarity. Extensive experiments demonstrate the superior performance of CANConv, compared with recent promising fusion methods. Besides, we substantiate the method's effectiveness through visualization, ablation experiments, and comparison with existing methods on multiple test sets. The source code is publicly available at https://github.com/duanyll/CANConv.

Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

TL;DR

CANConv introduces a content-adaptive non-local convolution for remote sensing pansharpening by clustering image regions and applying cluster-wise adaptive kernels. SRP identifies non-local self-similarity by clustering unfolded neighborhood features, while PWAC generates per-cluster kernels and biases from centroids, enabling efficient, region-aware information propagation via Y_{xy} = p_{xy} ⊗ f_k(c_{I_{xy}}) + f_b(c_{I_{xy}}). Built on this module, CANNet adopts a U-Net–style architecture to exploit multi-scale self-similarity and deliver state-of-the-art pansharpening performance on WV3, QB, and GF2 datasets, backed by thorough ablations, K-Means vs KNN analysis, and backprop considerations. The work provides extensive experimental validation, architectural insights, and open-source code, making a practical and scalable advance for remote-sensing image fusion.

Abstract

Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolution (CANConv), a novel method tailored for remote sensing image pansharpening. Specifically, CANConv employs adaptive convolution, ensuring spatial adaptability, and incorporates non-local self-similarity through the similarity relationship partition (SRP) and the partition-wise adaptive convolution (PWAC) sub-modules. Furthermore, we also propose a corresponding network architecture, called CANNet, which mainly utilizes the multi-scale self-similarity. Extensive experiments demonstrate the superior performance of CANConv, compared with recent promising fusion methods. Besides, we substantiate the method's effectiveness through visualization, ablation experiments, and comparison with existing methods on multiple test sets. The source code is publicly available at https://github.com/duanyll/CANConv.
Paper Structure (21 sections, 10 equations, 16 figures, 9 tables)

This paper contains 21 sections, 10 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: (a) Pansharpening involves fusing the PAN and LRMS images into an HRMS image. (b) A toy example of partitioned regions and (c) their corresponding content-adaptive convolution kernels, which is the motivation of this paper: 1) different content (regions) should be filtered by distinct kernels; 2) non-local information with the same content (regions) is extracted and represented only by the same convolution kernel.
  • Figure 2: Overall workflow of four convolution methods related to adaptivity and non-locality. (a) Global adaptive/standard convolution jiaDynamicFilterNetworks2016. (b) Spatial adaptive convolution suPixelAdaptiveConvolutionalNeural2019zhouDecoupledDynamicFilter2021jinLAGConvLocalContextAdaptive2022. (c) Graph convolution liCrossPatchGraphConvolutional2021zhouCrossScaleInternalGraph2020. (d) The proposed method.
  • Figure 3: The overall workflow for a CANConv module. CANConv consists of two sub-modules: Similarity Relationship Partition (SRP) and Partition-Wise Adaptive Convolution (PWAC). In SRP, the input feature map is unfolded and reduced to obtain samples for clustering. PWAC is applied separately for each cluster distinguished by SRP. The figure demonstrates how PWAC adaptively generates convolution kernels and bias for a single cluster in the feature map.
  • Figure 4: The overall architecture of CANNet. CANNet follows the classic U-Net design and features CAN-ResBlocks. Black arrows indicate the flow of feature maps, while blue arrows indicate the flow of cluster index matrices. The downsampling module halves the spatial resolution while doubling the number of channels, and the upsampling module does the opposite. Tail CAN-ResBlocks reuse cluster index matrices obtained in the Head CAN-ResBlock at the same level.
  • Figure 5: Qualitative result comparison between representative methods on the WV3 reduced-resolution dataset. The first row presents RGB outputs, while the second row shows the residual compared to the ground truth. Refer to supplementary material for more comparison.
  • ...and 11 more figures