SPECIAL: Zero-shot Hyperspectral Image Classification With CLIP
Li Pang, Jing Yao, Kaiyu Li, Jun Zhou, Deyu Meng, Xiangyong Cao
TL;DR
This work tackles zero-shot hyperspectral image classification by introducing SPECIAL, a two-stage framework that first uses spectral-to-RGB interpolation and CLIP-based open-vocabulary segmentation to generate pseudo-labels, then refines them through noise-robust spectral learning. The method leverages multi-scale resolution fusion and a Gaussian Mixture Model–driven soft-label refinement to mitigate label noise, with a warmup phase using a spectral classifier (MambaHSI) followed by a label-refinement phase that partitions samples into random, confident, and hard sets. Empirical results on three public HSIs—Pavia Centre, AeroRIT, and Chikusei—show consistent improvements over existing CLIP-based baselines in OA, AA, and $\kappa$, validating the effectiveness of incorporating full spectral information and probabilistic label refinement in zero-shot HSI classification. The approach is modular and data-efficient, offering practical potential for open-vocabulary hyperspectral interpretation without manual annotations.
Abstract
Hyperspectral image (HSI) classification aims to categorize each pixel in an HSI into a specific land cover class, which is crucial for applications such as remote sensing, environmental monitoring, and agriculture. Although deep learning-based HSI classification methods have achieved significant advancements, existing methods still rely on manually labeled data for training, which is both time-consuming and labor-intensive. To address this limitation, we introduce a novel zero-shot hyperspectral image classification framework based on CLIP (SPECIAL), aiming to eliminate the need for manual annotations. The SPECIAL framework consists of two main stages: (1) CLIP-based pseudo-label generation, and (2) noisy label learning. In the first stage, HSI is spectrally interpolated to produce RGB bands. These bands are subsequently classified using CLIP, resulting in noisy pseudo-labels that are accompanied by confidence scores. To improve the quality of these labels, we propose a scaling strategy that fuses predictions from multiple spatial scales. In the second stage, spectral information and a label refinement technique are incorporated to mitigate label noise and further enhance classification accuracy. Experimental results on three benchmark datasets demonstrate that our SPECIAL outperforms existing methods in zero-shot HSI classification, showing its potential for more practical applications. The code is available at https://github.com/LiPang/SPECIAL.
