Table of Contents
Fetching ...

Hybrid State-Space and GRU-based Graph Tokenization Mamba for Hyperspectral Image Classification

Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Muhammad Usama, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan, Danfeng Hong

TL;DR

GraphMamba introduces a hybrid framework for hyperspectral image classification that fuses spectral-spatial tokenization, graph token prioritization, cross-attention, and a GRU-based state-space model to capture complex spectral-spatial dynamics while maintaining scalability. The dual-tokenization uses $1\times 1$ spectral convolutions and $3\times 3$ spatial convolutions, followed by graph-based prioritization, a cross-attention module, and a GRU-driven sequence model to produce robust classifications. Ablation studies show that combining graph tokenization with attention yields the strongest performance across diverse datasets, achieving state-of-the-art accuracy with significantly fewer parameters than CNN/Transformer baselines and competitive runtime and memory profiles. The approach demonstrates strong generalization across datasets with varying spectral bands and resolutions, highlighting its practicality for real-world HSI tasks and resource-constrained deployments.

Abstract

Hyperspectral image (HSI) classification plays a pivotal role in domains such as environmental monitoring, agriculture, and urban planning. However, it faces significant challenges due to the high-dimensional nature of the data and the complex spectral-spatial relationships inherent in HSI. Traditional methods, including conventional machine learning and convolutional neural networks (CNNs), often struggle to effectively capture these intricate spectral-spatial features and global contextual information. Transformer-based models, while powerful in capturing long-range dependencies, often demand substantial computational resources, posing challenges in scenarios where labeled datasets are limited, as is commonly seen in HSI applications. To overcome these challenges, this work proposes GraphMamba, a hybrid model that combines spectral-spatial token generation, graph-based token prioritization, and cross-attention mechanisms. The model introduces a novel hybridization of state-space modeling and Gated Recurrent Units (GRU), capturing both linear and nonlinear spatial-spectral dynamics. GraphMamba enhances the ability to model complex spatial-spectral relationships while maintaining scalability and computational efficiency across diverse HSI datasets. Through comprehensive experiments, we demonstrate that GraphMamba outperforms existing state-of-the-art models, offering a scalable and robust solution for complex HSI classification tasks.

Hybrid State-Space and GRU-based Graph Tokenization Mamba for Hyperspectral Image Classification

TL;DR

GraphMamba introduces a hybrid framework for hyperspectral image classification that fuses spectral-spatial tokenization, graph token prioritization, cross-attention, and a GRU-based state-space model to capture complex spectral-spatial dynamics while maintaining scalability. The dual-tokenization uses spectral convolutions and spatial convolutions, followed by graph-based prioritization, a cross-attention module, and a GRU-driven sequence model to produce robust classifications. Ablation studies show that combining graph tokenization with attention yields the strongest performance across diverse datasets, achieving state-of-the-art accuracy with significantly fewer parameters than CNN/Transformer baselines and competitive runtime and memory profiles. The approach demonstrates strong generalization across datasets with varying spectral bands and resolutions, highlighting its practicality for real-world HSI tasks and resource-constrained deployments.

Abstract

Hyperspectral image (HSI) classification plays a pivotal role in domains such as environmental monitoring, agriculture, and urban planning. However, it faces significant challenges due to the high-dimensional nature of the data and the complex spectral-spatial relationships inherent in HSI. Traditional methods, including conventional machine learning and convolutional neural networks (CNNs), often struggle to effectively capture these intricate spectral-spatial features and global contextual information. Transformer-based models, while powerful in capturing long-range dependencies, often demand substantial computational resources, posing challenges in scenarios where labeled datasets are limited, as is commonly seen in HSI applications. To overcome these challenges, this work proposes GraphMamba, a hybrid model that combines spectral-spatial token generation, graph-based token prioritization, and cross-attention mechanisms. The model introduces a novel hybridization of state-space modeling and Gated Recurrent Units (GRU), capturing both linear and nonlinear spatial-spectral dynamics. GraphMamba enhances the ability to model complex spatial-spectral relationships while maintaining scalability and computational efficiency across diverse HSI datasets. Through comprehensive experiments, we demonstrate that GraphMamba outperforms existing state-of-the-art models, offering a scalable and robust solution for complex HSI classification tasks.

Paper Structure

This paper contains 15 sections, 19 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: The proposed model processes the input HSI by slicing it into spatial and spectral patches for feature extraction. Spatial tokens, capturing spatial patterns, are generated via spatial convolutions, while spectral tokens, emphasizing spectral features, are obtained through spectral convolutions. A graph token prioritization module computes importance scores for these tokens, enabling the construction of a graph adjacency matrix that effectively captures spatial-spectral relationships. A cross-attention mechanism further refines the tokens by facilitating interactions between prioritized tokens, which are then fused with the graph output. This fused information is passed through a hybrid SSM layer, which seamlessly integrates spatial and spectral dependencies for robust classification. The arrows indicate data flow between modules, with bold arrows representing the primary flow and dashed arrows illustrating state transitions. This architecture ensures efficient feature extraction and enhanced classification performance.
  • Figure 2: Visualization of permuted tokens with embeddings. Each node represents a token, categorized by degree order: Low (red), Mid (blue), and High (yellow). Below each node, small colored squares illustrate different embeddings, with distinct colors used to emphasize varying embedding representations. The arrangement highlights the distribution and variety of embeddings for different tokens in a visual format.
  • Figure 3: Cross Attention Process over spatial and spectral tokens.
  • Figure 4: Overall accuracy (OA) across different datasets as a function of the percentage of training samples, along with the execution time for each run.
  • Figure 5: Comparison of training time, inference time, and memory usage across different datasets. The dataset sizes are as follows: $512 \times 217 \times 224$ for SA, $340 \times 1905 \times 144$ for UH, $610 \times 610 \times 103$ for PU, $1096 \times 1096 \times 102$ for PC, and $1217 \times 303 \times 274$ for HC. The HC dataset exhibits significantly higher training and inference times, indicating that its larger size requires more time for both processes.
  • ...and 6 more figures