Label Semantics for Robust Hyperspectral Image Classification
Rafin Hassan, Zarin Tasnim Roshni, Rafiqul Bari, Alimul Islam, Nabeel Mohammed, Moshiur Farazi, Shafin Rahman
TL;DR
This work tackles the challenge of hyperspectral image classification under limited labeled data by introducing S3FN, a two-stage framework that fuses spectral–spatial features from a 3D-CNN with semantic label embeddings derived from rich, LLM-generated class descriptions. LLM prompts produce descriptive texts for each class, which are encoded by transformer-based text encoders to form semantic embeddings that guide alignment with image features. The architecture enables robust feature–label alignment and leverages patch-level voting to improve image predictions, with experiments on wood, blueberries, and DeepHS-Fruit datasets showing performance gains and insights into encoder choices. Overall, the approach demonstrates that contextual linguistic information can meaningfully augment hyperspectral classification, offering better generalization across diverse domains and paving the way for semantically guided HSI analysis in agriculture, environment, and beyond.
Abstract
Hyperspectral imaging (HSI) classification is a critical tool with widespread applications across diverse fields such as agriculture, environmental monitoring, medicine, and materials science. Due to the limited availability of high-quality training samples and the high dimensionality of spectral data, HSI classification models are prone to overfitting and often face challenges in balancing accuracy and computational complexity. Furthermore, most of HSI classification models are monomodal, where it solely relies on spectral-spatial data to learn decision boundaries in the high dimensional embedding space. To address this, we propose a general-purpose Semantic Spectral-Spatial Fusion Network (S3FN) that uses contextual, class specific textual descriptions to complement the training of an HSI classification model. Specifically, S3FN leverages LLMs to generate comprehensive textual descriptions for each class label that captures their unique characteristics and spectral behaviors. These descriptions are then embedded into a vector space using a pre-trained text encoder such as BERT or RoBERTa to extract meaningful label semantics which in turn leads to a better feature-label alignment for improved classification performance. To demonstrate the effectiveness of our approach, we evaluate our model on three diverse HSI benchmark datasets - Hyperspectral Wood, HyperspectralBlueberries, and DeepHS-Fruit and report significant performance boost. Our results highlight the synergy between textual semantics and spectral-spatial data, paving the way for further advancements in semantically augmented HSI classification models. Codes are be available in: https://github.com/milab-nsu/S3FN
