Table of Contents
Fetching ...

3D Fourier-based Global Feature Extraction for Hyperspectral Image Classification

Muhammad Ahmad

Abstract

Hyperspectral image classification (HSIC) has been significantly advanced by deep learning methods that exploit rich spatial-spectral correlations. However, existing approaches still face fundamental limitations: transformer-based models suffer from poor scalability due to the quadratic complexity of self-attention, while recent Fourier transform-based methods typically rely on 2D spatial FFTs and largely ignore critical inter-band spectral dependencies inherent to hyperspectral data. To address these challenges, we propose Hybrid GFNet (HGFNet), a novel architecture that integrates localized 3D convolutional feature extraction with frequency-domain global filtering via GFNet-style blocks for efficient and robust spatial-spectral representation learning. HGFNet introduces three complementary frequency transforms tailored to hyperspectral imagery: Spectral Fourier Transform (a 1D FFT along the spectral axis), Spatial Fourier Transform (a 2D FFT over spatial dimensions), and Spatial-Spatial Fourier Transform (a 3D FFT jointly over spectral and spatial dimensions), enabling comprehensive and high-dimensional frequency modeling. The 3D convolutional layers capture fine-grained local spatial-spectral structures, while the Fourier-based global filtering modules efficiently model long-range dependencies and suppress noise. To further mitigate the severe class imbalance commonly observed in HSIC, HGFNet incorporates an Adaptive Focal Loss (AFL) that dynamically adjusts class-wise focusing and weighting, improving discrimination for underrepresented classes.

3D Fourier-based Global Feature Extraction for Hyperspectral Image Classification

Abstract

Hyperspectral image classification (HSIC) has been significantly advanced by deep learning methods that exploit rich spatial-spectral correlations. However, existing approaches still face fundamental limitations: transformer-based models suffer from poor scalability due to the quadratic complexity of self-attention, while recent Fourier transform-based methods typically rely on 2D spatial FFTs and largely ignore critical inter-band spectral dependencies inherent to hyperspectral data. To address these challenges, we propose Hybrid GFNet (HGFNet), a novel architecture that integrates localized 3D convolutional feature extraction with frequency-domain global filtering via GFNet-style blocks for efficient and robust spatial-spectral representation learning. HGFNet introduces three complementary frequency transforms tailored to hyperspectral imagery: Spectral Fourier Transform (a 1D FFT along the spectral axis), Spatial Fourier Transform (a 2D FFT over spatial dimensions), and Spatial-Spatial Fourier Transform (a 3D FFT jointly over spectral and spatial dimensions), enabling comprehensive and high-dimensional frequency modeling. The 3D convolutional layers capture fine-grained local spatial-spectral structures, while the Fourier-based global filtering modules efficiently model long-range dependencies and suppress noise. To further mitigate the severe class imbalance commonly observed in HSIC, HGFNet incorporates an Adaptive Focal Loss (AFL) that dynamically adjusts class-wise focusing and weighting, improving discrimination for underrepresented classes.
Paper Structure (12 sections, 12 equations, 4 figures, 4 tables)

This paper contains 12 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: HSI cubes are first divided into 3D patches to preserve local and contextual dependencies. Three consecutive 3D Convolutional layers extract hierarchical features, while the GFNet layer enhances representations by transforming spatial-spectral data into the frequency domain. After normalization, a 3D FFT decomposes features into frequency components, where learnable global filters emphasize relevant information and suppress noise. The transformed features undergo regularization and pass through a Multilayer perceptron (MLP) block with AFL to handle class imbalances. A residual connection ensures stable gradient flow and feature preservation before dynamic fully connected layers generate the final class probabilities.
  • Figure 2: Visual comparison of classification maps on the HC dataset. (a) Ground truth maps: (b–g) predicted maps from competing models and our proposed HGFNet.
  • Figure 3: Predicted ground truth maps for the Indian Pines dataset.
  • Figure 4: Predicted ground truth maps for the HH dataset.