Table of Contents
Fetching ...

Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach

Mohamed Fadhlallah Guerri, Cosimo Distante, Paolo Spagnolo, Fares Bougourzi, Abdelmalik Taleb-Ahmed

TL;DR

This paper introduces an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block that leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling.

Abstract

During the process of classifying Hyperspectral Image (HSI), every pixel sample is categorized under a land-cover type. CNN-based techniques for HSI classification have notably advanced the field by their adept feature representation capabilities. However, acquiring deep features remains a challenge for these CNN-based methods. In contrast, transformer models are adept at extracting high-level semantic features, offering a complementary strength. This paper's main contribution is the introduction of an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block. This model leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling. The GSF block is designed to strengthen the extraction of local and global spatial-spectral features. An effective attention mechanism module is also proposed to enhance the extraction of information from HSI cubes. The proposed method is evaluated on four well-known datasets (the Indian Pines, Pavia University, WHU-WHU-Hi-LongKou and WHU-Hi-HanChuan), demonstrating that the proposed framework achieves superior results compared to other models.

Boosting Hyperspectral Image Classification with Gate-Shift-Fuse Mechanisms in a Novel CNN-Transformer Approach

TL;DR

This paper introduces an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block that leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling.

Abstract

During the process of classifying Hyperspectral Image (HSI), every pixel sample is categorized under a land-cover type. CNN-based techniques for HSI classification have notably advanced the field by their adept feature representation capabilities. However, acquiring deep features remains a challenge for these CNN-based methods. In contrast, transformer models are adept at extracting high-level semantic features, offering a complementary strength. This paper's main contribution is the introduction of an HSI classification model that includes two convolutional blocks, a Gate-Shift-Fuse (GSF) block and a transformer block. This model leverages the strengths of CNNs in local feature extraction and transformers in long-range context modelling. The GSF block is designed to strengthen the extraction of local and global spatial-spectral features. An effective attention mechanism module is also proposed to enhance the extraction of information from HSI cubes. The proposed method is evaluated on four well-known datasets (the Indian Pines, Pavia University, WHU-WHU-Hi-LongKou and WHU-Hi-HanChuan), demonstrating that the proposed framework achieves superior results compared to other models.
Paper Structure (18 sections, 3 equations, 5 figures, 8 tables)

This paper contains 18 sections, 3 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: The proposed framework for the HSI classification, The process begins with PCA to reduce the spectral dimensionality of the HSI data. Then, the data undergo a spatial feature extraction phase using 3D and 2D convolution layers (Conv3D, Conv2D). The extracted features are then processed through the GSF block, enhancing the local and global feature representation. A tokenizer is used to convert the features into a sequence of tokens, which are fed into a transformer encoder used to capture long-range dependencies and high-level semantic features. Finally, the output passes through a linear layer and a softmax activation to classify the hyperspectral pixels, with the classification result presented as a color-coded map.
  • Figure 2: In the GSF framework, the integration of group gating, forward and backward spectral shift, and fusion mechanisms are employed. The gating mechanism is facilitated by a single 3D convolution kernel, finely tuned with tanh calibration. For the fusion process, a single 2D convolution kernel is used, refined with sigmoid calibration. As a result, the adoption of GSF introduces a negligible increase in parameters.
  • Figure 3: (a) The main Transformer Encoder framework. (b) Multi-Head Self-Attention (MSA). (c) Self-Attention (SA) module
  • Figure 4: Ground-truth image of WHU-Hi-LongKou dataset
  • Figure 5: figure