Table of Contents
Fetching ...

Variational Autoencoder Framework for Hyperspectral Retrievals (Hyper-VAE) of Phytoplankton Absorption and Chlorophyll a in Coastal Waters for NASA's EMIT and PACE Missions

Jiadong Lou, Bingqing Liu, Yuanheng Xiong, Xiaodong Zhang, Xu Yuan

TL;DR

The paper tackles the problem of retrieving phytoplankton optical properties, specifically $a_{phy}$ and $Chl ext{-}a$, from hyperspectral $R_{rs}$ in coastal waters where one-to-many mappings are common. It introduces Hyper-VAE, with two dedicated VAEs (VAE-$a_{phy}$ and VAE-Chl-a), to efficiently learn latent representations and generate high-fidelity, uncertainty-aware predictions for NASA's EMIT and PACE missions, comparing against MDN baselines and a modified MDN (M-MDN). Across eight performance metrics and unseen Galveston Bay data, the VAE-based approach demonstrates superior stability, lower bias, and better generalization, particularly for high-dimensional $a_{phy}$ retrievals. The study highlights the practical potential of VAEs for hyperspectral ocean color applications, outlines forward pathways including data augmentation with radiative transfer models, and suggests multi-head extensions to predict multiple IOPs simultaneously for future missions like GLIMR and SBG.

Abstract

Phytoplankton absorb and scatter light in unique ways, subtly altering the color of water, changes that are often minor for human eyes to detect but can be captured by sensitive ocean color instruments onboard satellites from space. Hyperspectral sensors, paired with advanced algorithms, are expected to significantly enhance the characterization of phytoplankton community composition, especially in coastal waters where ocean color remote sensing applications have historically encountered significant challenges. This study presents novel machine learning-based solutions for NASA's hyperspectral missions, including EMIT and PACE, tackling high-fidelity retrievals of phytoplankton absorption coefficient and chlorophyll a from their hyperspectral remote sensing reflectance. Given that a single Rrs spectrum may correspond to varied combinations of inherent optical properties and associated concentrations, the Variational Autoencoder (VAE) is used as a backbone in this study to handle such multi-distribution prediction problems. We first time tailor the VAE model with innovative designs to achieve hyperspectral retrievals of aphy and of Chl-a from hyperspectral Rrs in optically complex estuarine-coastal waters. Validation with extensive experimental observation demonstrates superior performance of the VAE models with high precision and low bias. The in-depth analysis of VAE's advanced model structures and learning designs highlights the improvement and advantages of VAE-based solutions over the mixture density network (MDN) approach, particularly on high-dimensional data, such as PACE. Our study provides strong evidence that current EMIT and PACE hyperspectral data as well as the upcoming Surface Biology Geology mission will open new pathways toward a better understanding of phytoplankton community dynamics in aquatic ecosystems when integrated with AI technologies.

Variational Autoencoder Framework for Hyperspectral Retrievals (Hyper-VAE) of Phytoplankton Absorption and Chlorophyll a in Coastal Waters for NASA's EMIT and PACE Missions

TL;DR

The paper tackles the problem of retrieving phytoplankton optical properties, specifically and , from hyperspectral in coastal waters where one-to-many mappings are common. It introduces Hyper-VAE, with two dedicated VAEs (VAE- and VAE-Chl-a), to efficiently learn latent representations and generate high-fidelity, uncertainty-aware predictions for NASA's EMIT and PACE missions, comparing against MDN baselines and a modified MDN (M-MDN). Across eight performance metrics and unseen Galveston Bay data, the VAE-based approach demonstrates superior stability, lower bias, and better generalization, particularly for high-dimensional retrievals. The study highlights the practical potential of VAEs for hyperspectral ocean color applications, outlines forward pathways including data augmentation with radiative transfer models, and suggests multi-head extensions to predict multiple IOPs simultaneously for future missions like GLIMR and SBG.

Abstract

Phytoplankton absorb and scatter light in unique ways, subtly altering the color of water, changes that are often minor for human eyes to detect but can be captured by sensitive ocean color instruments onboard satellites from space. Hyperspectral sensors, paired with advanced algorithms, are expected to significantly enhance the characterization of phytoplankton community composition, especially in coastal waters where ocean color remote sensing applications have historically encountered significant challenges. This study presents novel machine learning-based solutions for NASA's hyperspectral missions, including EMIT and PACE, tackling high-fidelity retrievals of phytoplankton absorption coefficient and chlorophyll a from their hyperspectral remote sensing reflectance. Given that a single Rrs spectrum may correspond to varied combinations of inherent optical properties and associated concentrations, the Variational Autoencoder (VAE) is used as a backbone in this study to handle such multi-distribution prediction problems. We first time tailor the VAE model with innovative designs to achieve hyperspectral retrievals of aphy and of Chl-a from hyperspectral Rrs in optically complex estuarine-coastal waters. Validation with extensive experimental observation demonstrates superior performance of the VAE models with high precision and low bias. The in-depth analysis of VAE's advanced model structures and learning designs highlights the improvement and advantages of VAE-based solutions over the mixture density network (MDN) approach, particularly on high-dimensional data, such as PACE. Our study provides strong evidence that current EMIT and PACE hyperspectral data as well as the upcoming Surface Biology Geology mission will open new pathways toward a better understanding of phytoplankton community dynamics in aquatic ecosystems when integrated with AI technologies.

Paper Structure

This paper contains 23 sections, 13 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: NASA’s PACE-OCI Level 2 AOP data obtained on May 15, 2024, covers diverse water types in the northern Gulf of Mexico. The data is displayed using a band combination of remote sensing reflectance ($R_{rs}$) at 440, 560, and 650 nm and processed using the HyperCoast open-source tool (https://hypercoast.org/).
  • Figure 2: Spectral distribution of the $R_{rs}$-$a_{phy}$ dataset, with bars denoting the minimum and maximum values of (a) $R_{rs}$ ($sr^{-1}$) and (b) $a_{phy}$ ($m^{-1}$) using EMIT spectral setting.
  • Figure 3: The structure of VAE for Predicting $a_{phy}$ and Chl-a.
  • Figure 4: Scatter plots and evaluation metrics for VAE and MDN predictions of $a_{phy}$ at three representative wavelengths for PACE and EMIT. (a)–(c) VAE for PACE and EMIT at 440 nm, (d)–(f) MDN for PACE and EMIT at 440 nm, (g)–(i) VAE for PACE and EMIT at 620 nm, and (j)–(l) MDN for PACE and EMIT at 670 nm.
  • Figure 5: Performance of VAE in terms of RMSE, Log-Bias, and $\beta$ for $a_{phy}$ prediction on PACE wavelengths across 400nm-700nm.
  • ...and 6 more figures