Table of Contents
Fetching ...

Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken

Peifu Liu, Tingfa Xu, Jie Wang, Huan Chen, Huiyan Bai, Jianan Li

TL;DR

This work tackles boundary preservation and regional consistency in hyperspectral image classification by introducing the Dual-stage Spectral Supertoken Classifier (DSTC). DSTC first clusters spectrally similar pixels into spectral supertokens using spectrum-derivative features, then classifies these tokens with a Transformer and projects the results back to pixel space, guided by class-proportion-based soft labels $\boldsymbol{L} \in \mathbb{R}^{M\times C'}$ and token predictions $\hat{\boldsymbol{S}} \in \mathbb{R}^{M\times C'}$. Core contributions include the spectrum-derivative-based pixel clustering, semantic feature aggregation forming $\boldsymbol{S} \in \mathbb{R}^{M\times C_2}$, the CPSL supervision, and an end-to-end trainable two-stage pipeline, demonstrated on WHU-OHS, IP, KSC, UP with efficiency advantages (e.g., reduced FLOPs) and strong accuracy. The approach yields improved boundary delineation and regional coherence while enabling near real-time inference, with generalization validated by HS-SOD experiments and public code release.

Abstract

Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introduce the novel Dual-stage Spectral Supertoken Classifier (DSTC), inspired by superpixel concepts. DSTC employs spectrum-derivative-based pixel clustering to group pixels with similar spectral characteristics into spectral supertokens. By projecting the classification of these tokens onto the image space, we achieve pixel-level results that maintain regional classification consistency and precise boundary. Moreover, recognizing the diversity within tokens, we propose a class-proportion-based soft label. This label adaptively assigns weights to different categories based on their prevalence, effectively managing data distribution imbalances and enhancing classification performance. Comprehensive experiments on WHU-OHS, IP, KSC, and UP datasets corroborate the robust classification capabilities of DSTC and the effectiveness of its individual components. Code will be publicly available at https://github.com/laprf/DSTC.

Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken

TL;DR

This work tackles boundary preservation and regional consistency in hyperspectral image classification by introducing the Dual-stage Spectral Supertoken Classifier (DSTC). DSTC first clusters spectrally similar pixels into spectral supertokens using spectrum-derivative features, then classifies these tokens with a Transformer and projects the results back to pixel space, guided by class-proportion-based soft labels and token predictions . Core contributions include the spectrum-derivative-based pixel clustering, semantic feature aggregation forming , the CPSL supervision, and an end-to-end trainable two-stage pipeline, demonstrated on WHU-OHS, IP, KSC, UP with efficiency advantages (e.g., reduced FLOPs) and strong accuracy. The approach yields improved boundary delineation and regional coherence while enabling near real-time inference, with generalization validated by HS-SOD experiments and public code release.

Abstract

Hyperspectral image classification, a task that assigns pre-defined classes to each pixel in a hyperspectral image of remote sensing scenes, often faces challenges due to the neglect of correlations between spectrally similar pixels. This oversight can lead to inaccurate edge definitions and difficulties in managing minor spectral variations in contiguous areas. To address these issues, we introduce the novel Dual-stage Spectral Supertoken Classifier (DSTC), inspired by superpixel concepts. DSTC employs spectrum-derivative-based pixel clustering to group pixels with similar spectral characteristics into spectral supertokens. By projecting the classification of these tokens onto the image space, we achieve pixel-level results that maintain regional classification consistency and precise boundary. Moreover, recognizing the diversity within tokens, we propose a class-proportion-based soft label. This label adaptively assigns weights to different categories based on their prevalence, effectively managing data distribution imbalances and enhancing classification performance. Comprehensive experiments on WHU-OHS, IP, KSC, and UP datasets corroborate the robust classification capabilities of DSTC and the effectiveness of its individual components. Code will be publicly available at https://github.com/laprf/DSTC.
Paper Structure (15 sections, 12 equations, 6 figures, 7 tables)

This paper contains 15 sections, 12 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: (a) Single-stage pixel-wise classification models exhibit limitations in handling minor spectral variations and fail to deliver precise boundary delineation, as showcased in the black-framed area. In contrast, our (b) Dual-stage Spectral Supertoken Classifier effectively clusters similar pixels into spectral supertokens for token-wise classification, yielding improved classification within contiguous regions. This process is enhanced by (c) class-proportion-based soft labels, extracted from the proportion of each land cover type within each supertoken boundaries (illustrating with blue supertoken's example).
  • Figure 2: Dual-stage Spectral Supertoken Classifier begins by extracting semantic features through (a) spatial-preserved feature encoder. It then groups similar pixels using (b) spectrum-derivative-based pixel clustering. The spectral supertokens are obtained through (c) semantic feature aggregation. These tokens are subsequently classified by Transformer. The final classification map is generated by projecting these token-wise classifications back into the image space. Each token is supervised by a (d) class-proportion-based soft label during training. The varying colors in the soft labels represent different land cover types, with the proportions of these colors reflecting the respective presence of each land cover within each token.
  • Figure 3: Qualitative result on WHU-OHS dataset. Our DSTC is closest to the ground truth, demonstrating the best classification capability.
  • Figure 4: Confusion matrix of CLSJE and our DSTC-R on part class IDs.
  • Figure 5: Visualization of clustered spectrally similar pixels. For better visualization, the crop factor is set to 4.
  • ...and 1 more figures