Table of Contents
Fetching ...

SpACNN-LDVAE: Spatial Attention Convolutional Latent Dirichlet Variational Autoencoder for Hyperspectral Pixel Unmixing

Soham Chitnis, Kiran Mantripragada, Faisal Z. Qureshi

TL;DR

SpACNN-LDVAE advances hyperspectral pixel unmixing by incorporating local spatial context through a Spatial Attention CNN Encoder that yields a Dirichlet latent for abundances, coupled with a Multivariate Normal spectral decoder. The model extends LDVAE by exploiting spatial coherence, enforcing ASC and ANC via a softmax-based Dirichlet parameterization, and optimizing with an ELBO objective that includes a reconstruction term for abundances. Empirical results across Samson, HYDICE Urban, Cuprite, and OnTech-Syn-HSI-21 show improved endmember extraction and abundance estimation over the MLP-LDVAE baseline, with transfer learning from synthetic Cuprite data enabling real-world inference. The approach demonstrates the practical value of spatially aware unmixing in hyperspectral imaging and supports generating spectra from abundances, potentially aiding material identification in remote sensing applications.

Abstract

The hyperspectral pixel unmixing aims to find the underlying materials (endmembers) and their proportions (abundances) in pixels of a hyperspectral image. This work extends the Latent Dirichlet Variational Autoencoder (LDVAE) pixel unmixing scheme by taking into account local spatial context while performing pixel unmixing. The proposed method uses an isotropic convolutional neural network with spatial attention to encode pixels as a dirichlet distribution over endmembers. We have evaluated our model on Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21 datasets. Our model also leverages the transfer learning paradigm for Cuprite Dataset, where we train the model on synthetic data and evaluate it on the real-world data. The results suggest that incorporating spatial context improves both endmember extraction and abundance estimation.

SpACNN-LDVAE: Spatial Attention Convolutional Latent Dirichlet Variational Autoencoder for Hyperspectral Pixel Unmixing

TL;DR

SpACNN-LDVAE advances hyperspectral pixel unmixing by incorporating local spatial context through a Spatial Attention CNN Encoder that yields a Dirichlet latent for abundances, coupled with a Multivariate Normal spectral decoder. The model extends LDVAE by exploiting spatial coherence, enforcing ASC and ANC via a softmax-based Dirichlet parameterization, and optimizing with an ELBO objective that includes a reconstruction term for abundances. Empirical results across Samson, HYDICE Urban, Cuprite, and OnTech-Syn-HSI-21 show improved endmember extraction and abundance estimation over the MLP-LDVAE baseline, with transfer learning from synthetic Cuprite data enabling real-world inference. The approach demonstrates the practical value of spatially aware unmixing in hyperspectral imaging and supports generating spectra from abundances, potentially aiding material identification in remote sensing applications.

Abstract

The hyperspectral pixel unmixing aims to find the underlying materials (endmembers) and their proportions (abundances) in pixels of a hyperspectral image. This work extends the Latent Dirichlet Variational Autoencoder (LDVAE) pixel unmixing scheme by taking into account local spatial context while performing pixel unmixing. The proposed method uses an isotropic convolutional neural network with spatial attention to encode pixels as a dirichlet distribution over endmembers. We have evaluated our model on Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21 datasets. Our model also leverages the transfer learning paradigm for Cuprite Dataset, where we train the model on synthetic data and evaluate it on the real-world data. The results suggest that incorporating spatial context improves both endmember extraction and abundance estimation.
Paper Structure (11 sections, 8 equations, 2 figures, 5 tables)

This paper contains 11 sections, 8 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: CNN Latent Dirichlet Variational Autoencoder. Encoder $f$ takes an HSI patch $\mathbf{x}$ and constructs its latent representation (abundances). The decoder stage is able to reconstruct the pixel spectrum given abundances. Note that at training time the reconstruction loss is computed between the center pixel $\mathbf{x}_\text{center}$ and its reconstruction $\mathbf{\hat{x}}_\text{center}$.
  • Figure 2: Spatial Attention Convolutional Neural Network Encoder. The network takes an HSI patch $\mathbf{x}$ and returns abundances vector $\alpha$ for the center pixel $\mathbf{x}_\text{center}$.