Table of Contents
Fetching ...

Dynamic Memory-enhanced Transformer for Hyperspectral Image Classification

Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan

TL;DR

This work tackles hyperspectral image classification by addressing the limitations of conventional transformers in capturing long-range spatial–spectral dependencies with reduced redundancy. It introduces MemFormer, a lightweight memory-enhanced transformer that couples a dynamic memory-attention mechanism with a spectral-spatial positional encoding (SSPE) and patch-based tokenization to enhance feature expressiveness while maintaining efficiency. The method demonstrates state-of-the-art performance on benchmark datasets (IP, HC, HH) with significantly fewer parameters and competitive training times, validating the effectiveness of memory refinement and domain-aware encoding for HSIC. The authors also outline avenues for scaling, self-supervised learning, domain adaptation, edge deployment, and multi-modal data integration to broaden practical impact.

Abstract

Hyperspectral image (HSI) classification remains a challenging task due to the intricate spatial-spectral correlations. Existing transformer models excel in capturing long-range dependencies but often suffer from information redundancy and attention inefficiencies, limiting their ability to model fine-grained relationships crucial for HSI classification. To overcome these limitations, this work proposes MemFormer, a lightweight and memory-enhanced transformer. MemFormer introduces a memory-enhanced multi-head attention mechanism that iteratively refines a dynamic memory module, enhancing feature extraction while reducing redundancy across layers. Additionally, a dynamic memory enrichment strategy progressively captures complex spatial and spectral dependencies, leading to more expressive feature representations. To further improve structural consistency, we incorporate a spatial-spectral positional encoding (SSPE) tailored for HSI data, ensuring continuity without the computational burden of convolution-based approaches. Extensive experiments on benchmark datasets demonstrate that MemFormer achieves superior classification accuracy, outperforming state-of-the-art methods.

Dynamic Memory-enhanced Transformer for Hyperspectral Image Classification

TL;DR

This work tackles hyperspectral image classification by addressing the limitations of conventional transformers in capturing long-range spatial–spectral dependencies with reduced redundancy. It introduces MemFormer, a lightweight memory-enhanced transformer that couples a dynamic memory-attention mechanism with a spectral-spatial positional encoding (SSPE) and patch-based tokenization to enhance feature expressiveness while maintaining efficiency. The method demonstrates state-of-the-art performance on benchmark datasets (IP, HC, HH) with significantly fewer parameters and competitive training times, validating the effectiveness of memory refinement and domain-aware encoding for HSIC. The authors also outline avenues for scaling, self-supervised learning, domain adaptation, edge deployment, and multi-modal data integration to broaden practical impact.

Abstract

Hyperspectral image (HSI) classification remains a challenging task due to the intricate spatial-spectral correlations. Existing transformer models excel in capturing long-range dependencies but often suffer from information redundancy and attention inefficiencies, limiting their ability to model fine-grained relationships crucial for HSI classification. To overcome these limitations, this work proposes MemFormer, a lightweight and memory-enhanced transformer. MemFormer introduces a memory-enhanced multi-head attention mechanism that iteratively refines a dynamic memory module, enhancing feature extraction while reducing redundancy across layers. Additionally, a dynamic memory enrichment strategy progressively captures complex spatial and spectral dependencies, leading to more expressive feature representations. To further improve structural consistency, we incorporate a spatial-spectral positional encoding (SSPE) tailored for HSI data, ensuring continuity without the computational burden of convolution-based approaches. Extensive experiments on benchmark datasets demonstrate that MemFormer achieves superior classification accuracy, outperforming state-of-the-art methods.

Paper Structure

This paper contains 8 sections, 18 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overall architecture of dynamic memory-enhanced MHSA mechanism-based spatial-spectral transformer for HSIC.
  • Figure 2: Effect of memory size on OA.
  • Figure 3: Training and validation accuracy and loss curves for all competing methods. The figure illustrates the convergence behavior and generalization performance of each model across training epochs.
  • Figure 4: IP dataset: Classification maps, highlighting spatial variability and class-specific performance.
  • Figure 5: HC dataset: Classification maps, highlighting spatial variability and class-specific performance.
  • ...and 1 more figures