Table of Contents
Fetching ...

Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

Yimin Zhu, Linlin Xu

TL;DR

The paper tackles hyperspectral image classification under spatial-spectral heterogeneity and noise by marrying denoising diffusion probabilistic models with contrastive learning. It introduces DiffCRN, a two-stage framework with a staged DDPM backbone featuring spatial and spectral self-attention denoising modules, an adaptive time-step sampler using pixel-level spectral angle mapping, and a fusion-enabled classifier (AWAM and CTSSFM). The approach yields strong, unsupervised-to-supervised performance across four standard HSIs, with notable gains under limited training data and robust per-class accuracy. The findings suggest diffusion-based feature learning, combined with adaptive time-step selection and cross-time-step fusion, provides a practical and scalable pathway for HSIC in real-world applications.

Abstract

Although efficient extraction of discriminative spatial-spectral features is critical for hyperspectral images classification (HSIC), it is difficult to achieve these features due to factors such as the spatial-spectral heterogeneity and noise effect. This paper presents a Spatial-Spectral Diffusion Contrastive Representation Network (DiffCRN), based on denoising diffusion probabilistic model (DDPM) combined with contrastive learning (CL) for HSIC, with the following characteristics. First,to improve spatial-spectral feature representation, instead of adopting the UNets-like structure which is widely used for DDPM, we design a novel staged architecture with spatial self-attention denoising module (SSAD) and spectral group self-attention denoising module (SGSAD) in DiffCRN with improved efficiency for spectral-spatial feature learning. Second, to improve unsupervised feature learning efficiency, we design new DDPM model with logarithmic absolute error (LAE) loss and CL that improve the loss function effectiveness and increase the instance-level and inter-class discriminability. Third, to improve feature selection, we design a learnable approach based on pixel-level spectral angle mapping (SAM) for the selection of time steps in the proposed DDPM model in an adaptive and automatic manner. Last, to improve feature integration and classification, we design an Adaptive weighted addition modul (AWAM) and Cross time step Spectral-Spatial Fusion Module (CTSSFM) to fuse time-step-wise features and perform classification. Experiments conducted on widely used four HSI datasets demonstrate the improved performance of the proposed DiffCRN over the classical backbone models and state-of-the-art GAN, transformer models and other pretrained methods. The source code and pre-trained model will be made available publicly.

Spatial-Spectral Diffusion Contrastive Representation Network for Hyperspectral Image Classification

TL;DR

The paper tackles hyperspectral image classification under spatial-spectral heterogeneity and noise by marrying denoising diffusion probabilistic models with contrastive learning. It introduces DiffCRN, a two-stage framework with a staged DDPM backbone featuring spatial and spectral self-attention denoising modules, an adaptive time-step sampler using pixel-level spectral angle mapping, and a fusion-enabled classifier (AWAM and CTSSFM). The approach yields strong, unsupervised-to-supervised performance across four standard HSIs, with notable gains under limited training data and robust per-class accuracy. The findings suggest diffusion-based feature learning, combined with adaptive time-step selection and cross-time-step fusion, provides a practical and scalable pathway for HSIC in real-world applications.

Abstract

Although efficient extraction of discriminative spatial-spectral features is critical for hyperspectral images classification (HSIC), it is difficult to achieve these features due to factors such as the spatial-spectral heterogeneity and noise effect. This paper presents a Spatial-Spectral Diffusion Contrastive Representation Network (DiffCRN), based on denoising diffusion probabilistic model (DDPM) combined with contrastive learning (CL) for HSIC, with the following characteristics. First,to improve spatial-spectral feature representation, instead of adopting the UNets-like structure which is widely used for DDPM, we design a novel staged architecture with spatial self-attention denoising module (SSAD) and spectral group self-attention denoising module (SGSAD) in DiffCRN with improved efficiency for spectral-spatial feature learning. Second, to improve unsupervised feature learning efficiency, we design new DDPM model with logarithmic absolute error (LAE) loss and CL that improve the loss function effectiveness and increase the instance-level and inter-class discriminability. Third, to improve feature selection, we design a learnable approach based on pixel-level spectral angle mapping (SAM) for the selection of time steps in the proposed DDPM model in an adaptive and automatic manner. Last, to improve feature integration and classification, we design an Adaptive weighted addition modul (AWAM) and Cross time step Spectral-Spatial Fusion Module (CTSSFM) to fuse time-step-wise features and perform classification. Experiments conducted on widely used four HSI datasets demonstrate the improved performance of the proposed DiffCRN over the classical backbone models and state-of-the-art GAN, transformer models and other pretrained methods. The source code and pre-trained model will be made available publicly.

Paper Structure

This paper contains 56 sections, 28 equations, 18 figures, 11 tables.

Figures (18)

  • Figure 1: Denoising diffusion model forward and backward process. $q(\mathbfcal{H}_{\boldsymbol{t}}|\mathbfcal{H}_{\boldsymbol{t-1}})$, $p_{\theta}(\mathbfcal{H}_{\boldsymbol{t-1}}|\mathbfcal{H}_{\boldsymbol{t}})$ represent noising adding forward process and denoising backward process, respectively. The essential question is to estimate the conditional probability $q(\mathbfcal{H}_{\boldsymbol{t-1}}|\mathbfcal{H}_{\boldsymbol{t}})$.
  • Figure 2: Overall framework of the proposed DiffCRN model. The method consists of two steps. Step 1: As show in gray line, we pretrain the diffusion adversarial representation network $\epsilon_{\theta}$ and $\mathscr{A}_{\alpha}$with cropped HSI patches in an unsupervised manner. Step 2: As show in green line, another cropped HSI patches are feeding into the pretrained diffusion adversarial representation network which parameters are freezed, then, the feature from different stages on specific time step $t$ are extracted. The extracted features on the same stage but different $t$ are weighted add. Finally, a classifier $\mathscr{C}_{\beta}$ is trained by using the five-timestep representations of limit labelled data to perform HSI classification.
  • Figure 3: Stage architecture of the Spatial-Spectral Denoising Adversarial Representation Learning Network. Timestep embeddings is generated by MLP. DWConv and PWConv represent depthwise convolution and pointwise convolution respectively. (a) The network consists of two SSAD Modules and two SGSAD Modules. (b) The details of Spatial Self-Attention Denoising Module (SSAD) (c) The details of the Spectral Group Self-Attention Denoising Module (SGSAD).
  • Figure 4: (a) Spatial Self-Attention module (SSA) computes self-attention within local patch. (b) Spectral Group Self-Attention module (SGSA) groups spectral tokens into multi groups. Attention is performed in each spectral group with an entire patch-level spectral as a token. In this work, we introduce SSA into SSAD to obtain local feature, and introduce SGSA into SGSAD to obtain global features. $P$, $C$, $N_g$, $C_g$ represent patch size, number of spectral domian, number of spectral groups and spectrums per group respectively.
  • Figure 5: Contrastive Learning Module in DiffCRN.
  • ...and 13 more figures