Table of Contents
Fetching ...

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

Xueyang Wang, Zhixin Zheng, Jiandong Shao, Yule Duan, Liang-Jian Deng

TL;DR

Remote sensing pansharpening is hampered by fixed-square convolution and fixed sampling points, which fail to capture objects at varying scales. The authors introduce Adaptive Rectangular Convolution (ARConv), which learns kernel height $h$ and width $w$ and dynamically selects the number of sampling points, generating a sampling map and using bilinear interpolation with an affine transformation to enhance spatial flexibility. Built as ARNet by replacing standard convolutions in a U-Net with ARConv, the approach achieves strong performance on WV3, QB, and GF2 across reduced- and full-resolution datasets, with ablations and visualizations confirming the benefits of height/width adaptation, sampling-point adaptation, and affine transformation. The work demonstrates robust, per-object-scale feature extraction and offers a practical plug-and-play module for improving pansharpening quality in diverse remote sensing scenarios.

Abstract

Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged. Given the diverse object sizes in remote sensing images, these rigid parameters lead to suboptimal feature extraction. To overcome these limitations, we introduce an innovative convolutional module, Adaptive Rectangular Convolution (ARConv). ARConv adaptively learns both the height and width of the convolutional kernel and dynamically adjusts the number of sampling points based on the learned scale. This approach enables ARConv to effectively capture scale-specific features of various objects within an image, optimizing kernel sizes and sampling locations. Additionally, we propose ARNet, a network architecture in which ARConv is the primary convolutional module. Extensive evaluations across multiple datasets reveal the superiority of our method in enhancing pansharpening performance over previous techniques. Ablation studies and visualization further confirm the efficacy of ARConv.

Adaptive Rectangular Convolution for Remote Sensing Pansharpening

TL;DR

Remote sensing pansharpening is hampered by fixed-square convolution and fixed sampling points, which fail to capture objects at varying scales. The authors introduce Adaptive Rectangular Convolution (ARConv), which learns kernel height and width and dynamically selects the number of sampling points, generating a sampling map and using bilinear interpolation with an affine transformation to enhance spatial flexibility. Built as ARNet by replacing standard convolutions in a U-Net with ARConv, the approach achieves strong performance on WV3, QB, and GF2 across reduced- and full-resolution datasets, with ablations and visualizations confirming the benefits of height/width adaptation, sampling-point adaptation, and affine transformation. The work demonstrates robust, per-object-scale feature extraction and offers a practical plug-and-play module for improving pansharpening quality in diverse remote sensing scenarios.

Abstract

Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged. Given the diverse object sizes in remote sensing images, these rigid parameters lead to suboptimal feature extraction. To overcome these limitations, we introduce an innovative convolutional module, Adaptive Rectangular Convolution (ARConv). ARConv adaptively learns both the height and width of the convolutional kernel and dynamically adjusts the number of sampling points based on the learned scale. This approach enables ARConv to effectively capture scale-specific features of various objects within an image, optimizing kernel sizes and sampling locations. Additionally, we propose ARNet, a network architecture in which ARConv is the primary convolutional module. Extensive evaluations across multiple datasets reveal the superiority of our method in enhancing pansharpening performance over previous techniques. Ablation studies and visualization further confirm the efficacy of ARConv.

Paper Structure

This paper contains 25 sections, 15 equations, 18 figures, 12 tables.

Figures (18)

  • Figure 1: Top row: The comprehensive flowchart of remote sensing pansharpening via a DL-based approach. Bottom row: An illustrative example of our Adaptive Rectangular Convolution (ARConv), boasting two distinct advantages: 1) its convolution kernels can adaptively modify sampling positions in accordance with object sizes; 2) the quantity of sampling points is dynamically determined across various feature maps, for instance, achieving a $5\times 3$ adaptive rectangular convolution, which, to our knowledge, is the first attempt.
  • Figure 2: Diagrams illustrating the working principles of four types of convolutional kernels. (a) Standard Convolution. (b) Deformable Convolution Dai2017DeformableCNZhu2018DeformableCV. (c) Multi-scale Convolution PYconvLi2019SelectiveKN. (d) Our proposed Convolution (ARConv).
  • Figure 3: Overview of the ARConv architecture. This module consists of four main parts. The first part addresses the learning process of the convolution kernel's height and width. The second part focuses on the selection process for the number of sampling points of the convolution kernel. The third part simulates the generation process of the sampling map $\mathbf{S}$ using the grid center position $\mathbf{p}_0$ as an example. The final part describes the convolution operation process of ARConv.
  • Figure 4: Overall architecture of ARNet. ARNet replaces the standard convolution in U-Net's Resblock with ARConv to create AR-Resblock. The model has down-sampling blocks to extract high-level features and up-sampling blocks to restore spatial resolution with transposed convolutions. Skip connections help transfer detailed spatial information.
  • Figure 5: Qualitative comparison of benchmark methods on WV3 reduced-resolution dataset. Top: RGB outputs; Bottom: residuals vs. ground truth. See Suppl. \ref{['MR']} for details.
  • ...and 13 more figures