Table of Contents
Fetching ...

Global Chlorophyll-\textit{a} Retrieval algorithm from Sentinel 2 Using Residual Deep Learning and Novel Machine Learning Water Classification

Yotam Sherf, Bar Efrati, Gabriel Rozman, Moshe Harel

TL;DR

This work tackles global inland-water Chlorophyll-a retrieval from Sentinel-2 imagery under atmospheric and optical interferences by proposing a three-stage framework. It combines a Global Water Classifier to filter reliable water pixels, a base Chla regression with XGBoost, and a residual CNN correction to model structured errors, yielding strong generalization across diverse lakes. The approach achieves $R^2$ ≈ 0.79 and MAE ≈ 13.5 mg/m^3 on 867 water bodies, with a slope ≈ 0.91, demonstrating effective cross-lake transferability without site-specific tuning. Residual correction substantially improves accuracy, though very low Chla remains challenging due to atmospheric-correction artifacts, shaping guidance for global water-quality monitoring and policy support.

Abstract

We present the Global Water Classifier (GWC), a supervised, geospatially extensive Machine Learning (ML) classifier trained on Sen2Cor corrected Sentinel-2 surface reflectance data. Using nearly 100 globally distributed inland water bodies, GWC distinguishes water across Chlorophyll-a (Chla) levels from non-water spectra (clouds, sun glint, snow, ice, aquatic vegetation, land and sediments) and shows geographically stable performance. Building on this foundation model, we perform Chla retrieval based on a matchup Sentinel-2 reflectance data with the United States Geological Survey (USGS) AquaMatch in-situ dataset, covering diverse geographical and hydrological conditions. We train an XGBoost regressor on 13626 matchup points. The positive labeled scenes by the GWC consistently outperform the negatives and produce more accurate Chla retrieval values, which confirms the classifiers advantage in reducing various interferences. Next, residual analysis of the regression predictions revealed structured errors, motivating a residual CNN (RCNN) correction stage. We add a CNN residual stage trained on normalized residuals, which yield substantial improvement. Our algorithm was tested on 867 water bodies with over 2,000 predictions and Chla values up to 1000~mg$/m^{3}$, achieving $R^2$ = 0.79, MAE = 13.52~mg$/m^{3}$, and slope = 0.91, demonstrating robust, scalable, and globally transferable performance without additional tuning.

Global Chlorophyll-\textit{a} Retrieval algorithm from Sentinel 2 Using Residual Deep Learning and Novel Machine Learning Water Classification

TL;DR

This work tackles global inland-water Chlorophyll-a retrieval from Sentinel-2 imagery under atmospheric and optical interferences by proposing a three-stage framework. It combines a Global Water Classifier to filter reliable water pixels, a base Chla regression with XGBoost, and a residual CNN correction to model structured errors, yielding strong generalization across diverse lakes. The approach achieves ≈ 0.79 and MAE ≈ 13.5 mg/m^3 on 867 water bodies, with a slope ≈ 0.91, demonstrating effective cross-lake transferability without site-specific tuning. Residual correction substantially improves accuracy, though very low Chla remains challenging due to atmospheric-correction artifacts, shaping guidance for global water-quality monitoring and policy support.

Abstract

We present the Global Water Classifier (GWC), a supervised, geospatially extensive Machine Learning (ML) classifier trained on Sen2Cor corrected Sentinel-2 surface reflectance data. Using nearly 100 globally distributed inland water bodies, GWC distinguishes water across Chlorophyll-a (Chla) levels from non-water spectra (clouds, sun glint, snow, ice, aquatic vegetation, land and sediments) and shows geographically stable performance. Building on this foundation model, we perform Chla retrieval based on a matchup Sentinel-2 reflectance data with the United States Geological Survey (USGS) AquaMatch in-situ dataset, covering diverse geographical and hydrological conditions. We train an XGBoost regressor on 13626 matchup points. The positive labeled scenes by the GWC consistently outperform the negatives and produce more accurate Chla retrieval values, which confirms the classifiers advantage in reducing various interferences. Next, residual analysis of the regression predictions revealed structured errors, motivating a residual CNN (RCNN) correction stage. We add a CNN residual stage trained on normalized residuals, which yield substantial improvement. Our algorithm was tested on 867 water bodies with over 2,000 predictions and Chla values up to 1000~mg, achieving = 0.79, MAE = 13.52~mg, and slope = 0.91, demonstrating robust, scalable, and globally transferable performance without additional tuning.

Paper Structure

This paper contains 13 sections, 12 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: The map illustrates the global classification areas, highlighting the broad geographical spread. Classification was carried out across nearly 100 different water bodies, spanning diverse climatic conditions and geographical regions, with more than 500 classification points in total.
  • Figure 2: Classification pipeline workflow. The scheme illustrates the iterative approach conducted throughout the training and updating process of the GWC.
  • Figure 3: Workflow of the residual correction architecture described in Sections. \ref{['step3']} and \ref{['secres']}
  • Figure 4: Chla retrieval performances of the base XGB regression model. A subsample of 1114 datapoints from each class is shown. The performances of the positive class significantly outperform the negative one, as summarized in Table \ref{['tab:metrics_class']}.
  • Figure 5: The relative residuals are calculated by $(y^{\text{pred}}_{\text{XGB}} - y^{\text{true}})/y^{\text{true}}$. The 50, 75 and 90 percentiles are shown. The relative residuals increase as Chla values decrease. Quantitative assessment of the spread behavior is summarized in Table. \ref{['tab:cv_entropym']}.
  • ...and 10 more figures