Fusing Multi- and Hyperspectral Satellite Data for Harmful Algal Bloom Monitoring with Self-Supervised and Hierarchical Deep Learning
Nicholas LaHaye, Kelly M. Luis, Michelle M. Gierach
TL;DR
This work introduces SIT-FUSE, a self-supervised, multi-sensor data fusion framework for detecting and mapping harmful algal bloom severity and speciation across coastal regions. By integrating ocean-color reflectances from multiple satellites with TROPOSIF SIF measurements and employing hierarchical deep clustering, the approach produces context-aware phytoplankton concentration maps without requiring instrument-specific labels. Validation against in-situ data in the Gulf of Mexico and Southern California demonstrates the method’s potential to deliver scalable HAB monitoring in label-scarce environments, with promising results for extending to hyperspectral inputs like PACE OCI. The framework’s modular SSL encoders, data fusion strategies, and hierarchical embeddings enable exploratory analysis, cross-instrument tracking, and gradual operationalization for global aquatic biogeochemistry.
Abstract
We present a self-supervised machine learning framework for detecting and mapping harmful algal bloom (HAB) severity and speciation using multi-sensor satellite data. By fusing reflectance data from operational instruments (VIIRS, MODIS, Sentinel-3, PACE) with TROPOMI solar-induced fluorescence (SIF), our framework, called SIT-FUSE, generates HAB severity and speciation products without requiring per-instrument labeled datasets. The framework employs self-supervised representation learning, hierarchical deep clustering to segment phytoplankton concentrations and speciations into interpretable classes, validated against in-situ data from the Gulf of Mexico and Southern California (2018-2025). Results show strong agreement with total phytoplankton, Karenia brevis, Alexandrium spp., and Pseudo-nitzschia spp. measurements. This work advances scalable HAB monitoring in label-scarce environments while enabling exploratory analysis via hierarchical embeddings: a critical step toward operationalizing self-supervised learning for global aquatic biogeochemistry.
