Table of Contents
Fetching ...

Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion

Yizhen Jiang, Mengting Ma, Anqi Zhu, Xiaowen Ma, Jiaxin Li, Wei Zhang

TL;DR

This work tackles pan-sharpening for remote sensing on resource-limited devices by introducing S2BNet, a binarized network built around Spatial-Spectral Binarized Convolution (S2B-Conv). S2B-Conv combines a Spectral-Redistribution Mechanism, which learns data-driven channel-wise scaling and bias to adapt spectral distributions, with a Gabor Spatial Feature Amplifier that uses randomly parameterized Gabor kernels to capture multi-scale, multi-directional spatial textures. The method binarizes activations and weights to enable efficient XNOR-like computation, while residual connections and RPReLU help preserve information during binarization. Experiments on WorldView-2, GaoFen-2, and QuickBird datasets show that S2BNet surpasses other binary networks by large margins and approaches, or matches, full-precision pan-sharpening performance, making it suitable for deployment on edge devices; code will be released to support reproducibility and further research.

Abstract

Remote sensing pansharpening aims to reconstruct spatial-spectral properties during the fusion of panchromatic (PAN) images and low-resolution multi-spectral (LR-MS) images, finally generating the high-resolution multi-spectral (HR-MS) images. Although deep learning-based models have achieved excellent performance, they often come with high computational complexity, which hinder their applications on resource-limited devices. In this paper, we explore the feasibility of applying the binary neural network (BNN) to pan-sharpening. Nevertheless, there are two main issues with binarizing pan-sharpening models: (i) the binarization will cause serious spectral distortion due to the inconsistent spectral distribution of the PAN/LR-MS images; (ii) the common binary convolution kernel is difficult to adapt to the multi-scale and anisotropic spatial features of remote sensing objects, resulting in serious degradation of contours. To address the above issues, we design the customized spatial-spectral binarized convolution (S2B-Conv), which is composed of the Spectral-Redistribution Mechanism (SRM) and Gabor Spatial Feature Amplifier (GSFA). Specifically, SRM employs an affine transformation, generating its scaling and bias parameters through a dynamic learning process. GSFA, which randomly selects different frequencies and angles within a preset range, enables to better handle multi-scale and-directional spatial features. A series of S2B-Conv form a brand-new binary network for pan-sharpening, dubbed as S2BNet. Extensive quantitative and qualitative experiments have shown our high-efficiency binarized pan-sharpening method can attain a promising performance.

Spatial-Spectral Binarized Neural Network for Panchromatic and Multi-spectral Images Fusion

TL;DR

This work tackles pan-sharpening for remote sensing on resource-limited devices by introducing S2BNet, a binarized network built around Spatial-Spectral Binarized Convolution (S2B-Conv). S2B-Conv combines a Spectral-Redistribution Mechanism, which learns data-driven channel-wise scaling and bias to adapt spectral distributions, with a Gabor Spatial Feature Amplifier that uses randomly parameterized Gabor kernels to capture multi-scale, multi-directional spatial textures. The method binarizes activations and weights to enable efficient XNOR-like computation, while residual connections and RPReLU help preserve information during binarization. Experiments on WorldView-2, GaoFen-2, and QuickBird datasets show that S2BNet surpasses other binary networks by large margins and approaches, or matches, full-precision pan-sharpening performance, making it suitable for deployment on edge devices; code will be released to support reproducibility and further research.

Abstract

Remote sensing pansharpening aims to reconstruct spatial-spectral properties during the fusion of panchromatic (PAN) images and low-resolution multi-spectral (LR-MS) images, finally generating the high-resolution multi-spectral (HR-MS) images. Although deep learning-based models have achieved excellent performance, they often come with high computational complexity, which hinder their applications on resource-limited devices. In this paper, we explore the feasibility of applying the binary neural network (BNN) to pan-sharpening. Nevertheless, there are two main issues with binarizing pan-sharpening models: (i) the binarization will cause serious spectral distortion due to the inconsistent spectral distribution of the PAN/LR-MS images; (ii) the common binary convolution kernel is difficult to adapt to the multi-scale and anisotropic spatial features of remote sensing objects, resulting in serious degradation of contours. To address the above issues, we design the customized spatial-spectral binarized convolution (S2B-Conv), which is composed of the Spectral-Redistribution Mechanism (SRM) and Gabor Spatial Feature Amplifier (GSFA). Specifically, SRM employs an affine transformation, generating its scaling and bias parameters through a dynamic learning process. GSFA, which randomly selects different frequencies and angles within a preset range, enables to better handle multi-scale and-directional spatial features. A series of S2B-Conv form a brand-new binary network for pan-sharpening, dubbed as S2BNet. Extensive quantitative and qualitative experiments have shown our high-efficiency binarized pan-sharpening method can attain a promising performance.

Paper Structure

This paper contains 18 sections, 15 equations, 3 figures, 6 tables, 1 algorithm.

Figures (3)

  • Figure 1: The overall frame of our method. Our model primarily consists of the novel convolution (S2B-Conv), which incorporates two core modules: the Spectral-Redistribution Mechanism (denoted as "Redistribution") and the Gabor Spatial Feature Amplifier (denoted as "Gabor"). The specific implementation of Gabor is detailed in Algorithm 1.
  • Figure 2: Visual comparison between our model and other binary methods on GF-2 example. The top two lines represent the reconstructed results and corresponding MAE maps of the reduced-resolution example, and the last line represents the reconstructed results of the full-resolution example.
  • Figure 3: Visual comparison between our model and other binary methods on WV-2 example. The top two lines represent the reconstructed results and corresponding MAE maps of the reduced-resolution example, and the last line represents the reconstructed results of the full-resolution example.