Latent Dirichlet Transformer VAE for Hyperspectral Unmixing with Bundled Endmembers
Giancarlo Giannetti, Faisal Z. Qureshi
TL;DR
This work tackles hyperspectral unmixing by integrating a transformer-based encoder with a Dirichlet latent prior to enforce physically meaningful abundance constraints. The LDVAE-T framework introduces a decoder that predicts distributional endmembers in the form of mean spectra and structured segment-wise covariances, enabling endmember variability to be captured without fixed spectra. Training optimizes an ELBO objective plus supervised abundance and endmember losses, with an annealing scheme to balance these terms, and abundances are drawn from a Dirichlet distribution p(z) = Dir(α^{prior}). Evaluations on Samson, Jasper Ridge, and HYDICE Urban show state-of-the-art RMSE for abundance estimation and SAD for endmember extraction, highlighting the benefits of learned endmember bundles and the Dirichlet latent space for physically plausible and accurate unmixing.
Abstract
Hyperspectral images capture rich spectral information that enables per-pixel material identification; however, spectral mixing often obscures pure material signatures. To address this challenge, we propose the Latent Dirichlet Transformer Variational Autoencoder (LDVAE-T) for hyperspectral unmixing. Our model combines the global context modeling capabilities of transformer architectures with physically meaningful constraints imposed by a Dirichlet prior in the latent space. This prior naturally enforces the sum-to-one and non-negativity conditions essential for abundance estimation, thereby improving the quality of predicted mixing ratios. A key contribution of LDVAE-T is its treatment of materials as bundled endmembers, rather than relying on fixed ground truth spectra. In the proposed method our decoder predicts, for each endmember and each patch, a mean spectrum together with a structured (segmentwise) covariance that captures correlated spectral variability. Reconstructions are formed by mixing these learned bundles with Dirichlet-distributed abundances garnered from a transformer encoder, allowing the model to represent intrinsic material variability while preserving physical interpretability. We evaluate our approach on three benchmark datasets, Samson, Jasper Ridge, and HYDICE Urban and show that LDVAE-T consistently outperforms state-of-the-art models in abundance estimation and endmember extraction, as measured by root mean squared error and spectral angle distance, respectively.
