Table of Contents
Fetching ...

Boosting the Generalization Ability for Hyperspectral Image Classification using Spectral-spatial Axial Aggregation Transformer

Enzhe Zhao, Zhichang Guo, Shengzhu Shi, Yao Li, Jia Li, Dazhi Zhang

TL;DR

This work proposes a spectral-spatial axial aggregation transformer model, namely, SaaFormer, which preserves generalization across dataset partitions and shows excellent performance on background classification.

Abstract

In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling. By training on a small amount of data, the deep learning model can achieve almost perfect accuracy. However, in our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information. On non-overlapping dataset partitions, well-performing models suffer significant performance degradation. To this end, we propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions. SaaFormer applies a multi-level spectral extraction structure to segment the spectrum into multiple spectrum clips, such that the wavelength continuity of the spectrum across the channel are preserved. For each spectrum clip, the axial aggregation attention mechanism, which integrates spatial features along multiple spectral axes is applied to mine the spectral characteristic. The multi-level spectral extraction and the axial aggregation attention emphasize spectral characteristic to improve the model generalization. The experimental results on five publicly available datasets demonstrate that our model exhibits comparable performance on the random partition, while significantly outperforming other methods on non-overlapping partitions. Moreover, SaaFormer shows excellent performance on background classification.

Boosting the Generalization Ability for Hyperspectral Image Classification using Spectral-spatial Axial Aggregation Transformer

TL;DR

This work proposes a spectral-spatial axial aggregation transformer model, namely, SaaFormer, which preserves generalization across dataset partitions and shows excellent performance on background classification.

Abstract

In the hyperspectral image classification (HSIC) task, the most commonly used model validation paradigm is partitioning the training-test dataset through pixel-wise random sampling. By training on a small amount of data, the deep learning model can achieve almost perfect accuracy. However, in our experiments, we found that the high accuracy was reached because the training and test datasets share a lot of information. On non-overlapping dataset partitions, well-performing models suffer significant performance degradation. To this end, we propose a spectral-spatial axial aggregation transformer model, namely SaaFormer, that preserves generalization across dataset partitions. SaaFormer applies a multi-level spectral extraction structure to segment the spectrum into multiple spectrum clips, such that the wavelength continuity of the spectrum across the channel are preserved. For each spectrum clip, the axial aggregation attention mechanism, which integrates spatial features along multiple spectral axes is applied to mine the spectral characteristic. The multi-level spectral extraction and the axial aggregation attention emphasize spectral characteristic to improve the model generalization. The experimental results on five publicly available datasets demonstrate that our model exhibits comparable performance on the random partition, while significantly outperforming other methods on non-overlapping partitions. Moreover, SaaFormer shows excellent performance on background classification.
Paper Structure (33 sections, 4 theorems, 20 equations, 23 figures, 20 tables)

This paper contains 33 sections, 4 theorems, 20 equations, 23 figures, 20 tables.

Key Result

Theorem 3.1

The dataset is divided into mutually exclusive training and testing sets through random permutation, where the training set accounts for $\alpha$, and the testing set accounts for $1- \alpha$. If the sample size is $n\times n$, for an arbitrary test sample $S$, the probability it shares information The expected information benefit from the training dataset is where $H(S)$ is the entropy of $S$.

Figures (23)

  • Figure 1: Overview Architecture of SaaFormer for HS image classification task. The model architecture consists of two primary components: the multi-level spectral extraction structure and the axial aggregation attention.
  • Figure 2: Diagram of the Axial Transformer Block in the proposed SaaFormer. The attention process compresses the feature map either vertically or horizontally and calculates self-attention separately for each axis.
  • Figure 3: The pixel-wise random sampling dataset partition. Five percents of the pixels are selected as the training dataset and marked in orange. The rest pixels (blue and white) are samples in the test dataset.
  • Figure 4: Comparison of the accuracy of the overlap rate of training samples and test samples for different models.
  • Figure 5: Spectral features visualizations: (a) and (b) are spectral features extracted by the CNN model ref15, while (c) and (d) are spectral features extracted by the 3DCNN model ref17. The blue and oranges lines are spectral features of two samples from the same class, for instance (a) and (c), or different classes, such as (b) and (d).
  • ...and 18 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2
  • Proposition 3.1
  • Proposition 3.2