Table of Contents
Fetching ...

Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification

Jingyi Zhou, Jiamu Sheng, Jiayuan Fan, Peng Ye, Tong He, Bin Wang, Tao Chen

TL;DR

This paper tackles hyperspectral image classification by addressing the underutilization of diffusion-based features. It introduces MTMSD, a framework that pretrains a diffusion model on unlabeled HSI patches and extracts multi-timestep multi-stage diffusion features, which are then refined by class- and timestep-oriented feature purification and guided by global features through selective timestep fusion. The approach yields superior results across four public datasets, notably Houston 2018, outperforming both supervised and other diffusion-based methods while improving efficiency. By integrating contextual semantics and textual diffusion cues into a unified representation, MTMSD advances robust, dataset-general diffusion-based HSI classification with practical feasibility.

Abstract

The effectiveness of spectral-spatial feature learning is crucial for the hyperspectral image (HSI) classification task. Diffusion models, as a new class of groundbreaking generative models, have the ability to learn both contextual semantics and textual details from the distinct timestep dimension, enabling the modeling of complex spectral-spatial relations in HSIs. However, existing diffusion-based HSI classification methods only utilize manually selected single-timestep single-stage features, limiting the full exploration and exploitation of rich contextual semantics and textual information hidden in the diffusion model. To address this issue, we propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD. Specifically, the diffusion model is first pretrained with unlabeled HSI patches to mine the connotation of unlabeled data, and then is used to extract the multi-timestep multi-stage diffusion features. To effectively and efficiently leverage multi-timestep multi-stage features,two strategies are further developed. One strategy is class & timestep-oriented multi-stage feature purification module with the inter-class and inter-timestep prior for reducing the redundancy of multi-stage features and alleviating memory constraints. The other one is selective timestep feature fusion module with the guidance of global features to adaptively select different timestep features for integrating texture and semantics. Both strategies facilitate the generality and adaptability of the MTMSD framework for diverse patterns of different HSI data. Extensive experiments are conducted on four public HSI datasets, and the results demonstrate that our method outperforms state-of-the-art methods for HSI classification, especially on the challenging Houston 2018 dataset.

Exploring Multi-Timestep Multi-Stage Diffusion Features for Hyperspectral Image Classification

TL;DR

This paper tackles hyperspectral image classification by addressing the underutilization of diffusion-based features. It introduces MTMSD, a framework that pretrains a diffusion model on unlabeled HSI patches and extracts multi-timestep multi-stage diffusion features, which are then refined by class- and timestep-oriented feature purification and guided by global features through selective timestep fusion. The approach yields superior results across four public datasets, notably Houston 2018, outperforming both supervised and other diffusion-based methods while improving efficiency. By integrating contextual semantics and textual diffusion cues into a unified representation, MTMSD advances robust, dataset-general diffusion-based HSI classification with practical feasibility.

Abstract

The effectiveness of spectral-spatial feature learning is crucial for the hyperspectral image (HSI) classification task. Diffusion models, as a new class of groundbreaking generative models, have the ability to learn both contextual semantics and textual details from the distinct timestep dimension, enabling the modeling of complex spectral-spatial relations in HSIs. However, existing diffusion-based HSI classification methods only utilize manually selected single-timestep single-stage features, limiting the full exploration and exploitation of rich contextual semantics and textual information hidden in the diffusion model. To address this issue, we propose a novel diffusion-based feature learning framework that explores Multi-Timestep Multi-Stage Diffusion features for HSI classification for the first time, called MTMSD. Specifically, the diffusion model is first pretrained with unlabeled HSI patches to mine the connotation of unlabeled data, and then is used to extract the multi-timestep multi-stage diffusion features. To effectively and efficiently leverage multi-timestep multi-stage features,two strategies are further developed. One strategy is class & timestep-oriented multi-stage feature purification module with the inter-class and inter-timestep prior for reducing the redundancy of multi-stage features and alleviating memory constraints. The other one is selective timestep feature fusion module with the guidance of global features to adaptively select different timestep features for integrating texture and semantics. Both strategies facilitate the generality and adaptability of the MTMSD framework for diverse patterns of different HSI data. Extensive experiments are conducted on four public HSI datasets, and the results demonstrate that our method outperforms state-of-the-art methods for HSI classification, especially on the challenging Houston 2018 dataset.
Paper Structure (37 sections, 20 equations, 13 figures, 11 tables)

This paper contains 37 sections, 20 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: Overview of existing diffusion-based feature learning frameworks for the HSI classification task. Our method can fully explore and exploit rich contextual semantics and textual features hidden in the diffusion model.
  • Figure 2: Overview of our proposed MTMSD. The method consists of two steps. Step 1: We pretrain the DDPM with HSI patches in an unsupervised manner for diffusion feature learning. Step 2: We extract multi-timestep multi-stage diffusion features from the pretrained denoising U-Net decoder and construct the timestep-wise center and global feature bank by center extraction and average pooling. To effectively and efficiently leverage multi-timestep multi-stage features, we first perform class & timestep-oriented multi-stage purification on multi-stage features in the timestep-wise center feature bank, and then, we perform selective timestep feature fusion with global-feature guidance on the purified timestep-wise center feature bank. Classification is performed through an ensemble of lightweight classifiers.
  • Figure 3: Purification index generation of our proposed class & timestep-oriented feature purification module.
  • Figure 4: The structure of global feature-guided selective timestep network in our proposed selective timestep feature fusion.
  • Figure 5: Classification maps obtained by different methods on the Indian Pines dataset. (a) Ground truth. (b) 2-D CNN (OA=87.77%). (c) 3-D CNN (OA=85.42%). (d) SSRN (OA=97.75%). (e) SF (OA=92.31%). (f) SSFTT (OA=97.47%). (g) GAHT (OA=97.95%). (h) 3DCAE (OA=92.69%). (i) 3DAES (OA=94.34%). (j) UMSDFL (OA=97.02%). (k) SpectralDiff (OA=98.54%). (l) MTMSD (OA=99.45%).
  • ...and 8 more figures