Variational Autoencoder Framework for Hyperspectral Retrievals (Hyper-VAE) of Phytoplankton Absorption and Chlorophyll a in Coastal Waters for NASA's EMIT and PACE Missions
Jiadong Lou, Bingqing Liu, Yuanheng Xiong, Xiaodong Zhang, Xu Yuan
TL;DR
The paper tackles the problem of retrieving phytoplankton optical properties, specifically $a_{phy}$ and $Chl ext{-}a$, from hyperspectral $R_{rs}$ in coastal waters where one-to-many mappings are common. It introduces Hyper-VAE, with two dedicated VAEs (VAE-$a_{phy}$ and VAE-Chl-a), to efficiently learn latent representations and generate high-fidelity, uncertainty-aware predictions for NASA's EMIT and PACE missions, comparing against MDN baselines and a modified MDN (M-MDN). Across eight performance metrics and unseen Galveston Bay data, the VAE-based approach demonstrates superior stability, lower bias, and better generalization, particularly for high-dimensional $a_{phy}$ retrievals. The study highlights the practical potential of VAEs for hyperspectral ocean color applications, outlines forward pathways including data augmentation with radiative transfer models, and suggests multi-head extensions to predict multiple IOPs simultaneously for future missions like GLIMR and SBG.
Abstract
Phytoplankton absorb and scatter light in unique ways, subtly altering the color of water, changes that are often minor for human eyes to detect but can be captured by sensitive ocean color instruments onboard satellites from space. Hyperspectral sensors, paired with advanced algorithms, are expected to significantly enhance the characterization of phytoplankton community composition, especially in coastal waters where ocean color remote sensing applications have historically encountered significant challenges. This study presents novel machine learning-based solutions for NASA's hyperspectral missions, including EMIT and PACE, tackling high-fidelity retrievals of phytoplankton absorption coefficient and chlorophyll a from their hyperspectral remote sensing reflectance. Given that a single Rrs spectrum may correspond to varied combinations of inherent optical properties and associated concentrations, the Variational Autoencoder (VAE) is used as a backbone in this study to handle such multi-distribution prediction problems. We first time tailor the VAE model with innovative designs to achieve hyperspectral retrievals of aphy and of Chl-a from hyperspectral Rrs in optically complex estuarine-coastal waters. Validation with extensive experimental observation demonstrates superior performance of the VAE models with high precision and low bias. The in-depth analysis of VAE's advanced model structures and learning designs highlights the improvement and advantages of VAE-based solutions over the mixture density network (MDN) approach, particularly on high-dimensional data, such as PACE. Our study provides strong evidence that current EMIT and PACE hyperspectral data as well as the upcoming Surface Biology Geology mission will open new pathways toward a better understanding of phytoplankton community dynamics in aquatic ecosystems when integrated with AI technologies.
