Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao, Yule Duan, Liang-Jian Deng
TL;DR
Remote sensing pansharpening is hampered by fixed-square convolution and fixed sampling points, which fail to capture objects at varying scales. The authors introduce Adaptive Rectangular Convolution (ARConv), which learns kernel height $h$ and width $w$ and dynamically selects the number of sampling points, generating a sampling map and using bilinear interpolation with an affine transformation to enhance spatial flexibility. Built as ARNet by replacing standard convolutions in a U-Net with ARConv, the approach achieves strong performance on WV3, QB, and GF2 across reduced- and full-resolution datasets, with ablations and visualizations confirming the benefits of height/width adaptation, sampling-point adaptation, and affine transformation. The work demonstrates robust, per-object-scale feature extraction and offers a practical plug-and-play module for improving pansharpening quality in diverse remote sensing scenarios.
Abstract
Recent advancements in convolutional neural network (CNN)-based techniques for remote sensing pansharpening have markedly enhanced image quality. However, conventional convolutional modules in these methods have two critical drawbacks. First, the sampling positions in convolution operations are confined to a fixed square window. Second, the number of sampling points is preset and remains unchanged. Given the diverse object sizes in remote sensing images, these rigid parameters lead to suboptimal feature extraction. To overcome these limitations, we introduce an innovative convolutional module, Adaptive Rectangular Convolution (ARConv). ARConv adaptively learns both the height and width of the convolutional kernel and dynamically adjusts the number of sampling points based on the learned scale. This approach enables ARConv to effectively capture scale-specific features of various objects within an image, optimizing kernel sizes and sampling locations. Additionally, we propose ARNet, a network architecture in which ARConv is the primary convolutional module. Extensive evaluations across multiple datasets reveal the superiority of our method in enhancing pansharpening performance over previous techniques. Ablation studies and visualization further confirm the efficacy of ARConv.
