Benchmarking Artificial Intelligence Models for Daily Coastal Hypoxia Forecasting

Magesh Rajasekaran; Md Saiful Sajol; Chris Alvin; Supratik Mukhopadhyay; Yanda Ou; Z. George Xue

Benchmarking Artificial Intelligence Models for Daily Coastal Hypoxia Forecasting

Magesh Rajasekaran, Md Saiful Sajol, Chris Alvin, Supratik Mukhopadhyay, Yanda Ou, Z. George Xue

TL;DR

This work tackles the problem of daily coastal hypoxia forecasting in the Gulf of Mexico by benchmarking four deep learning architectures under identical data preparation and evaluation protocols. Using twelve years of COAWST hindcast data for training (2009–2020) and testing (2020–2024), the study frames hypoxia prediction as a binary sequence-to-one task with 7-day input windows and evaluates BiLSTM, TCN, Medformer, and ST-Transformer. The ST-Transformer achieves the strongest performance (AUC-ROC 0.982–0.992) across test periods, with McNemar's test confirming significant differences among several model pairs; however, ST-Transformer and BiLSTM are not always statistically different in every comparison. The authors provide a reproducible, real-time hypoxia-prediction framework and demonstrate that spatio-temporal modeling yields practical benefits for operational ecosystem management and resilience in coastal regions.

Abstract

Coastal hypoxia, especially in the northern part of Gulf of Mexico, presents a persistent ecological and economic concern. Seasonal models offer coarse forecasts that miss the fine-scale variability needed for daily, responsive ecosystem management. We present study that compares four deep learning architectures for daily hypoxia classification: Bidirectional Long Short-Term Memory (BiLSTM), Medformer (Medical Transformer), Spatio-Temporal Transformer (ST-Transformer), and Temporal Convolutional Network (TCN). We trained our models with twelve years of daily hindcast data from 2009-2020 Our training data consists of 2009-2020 hindcast data from a coupled hydrodynamic-biogeochemical model. Similarly, we use hindcast data from 2020 through 2024 as a test data. We constructed classification models incorporating water column stratification, sediment oxygen consumption, and temperature-dependent decomposition rates. We evaluated each architectures using the same data preprocessing, input/output formulation, and validation protocols. Each model achieved high classification accuracy and strong discriminative ability with ST-Transformer achieving the highest performance across all metrics and tests periods (AUC-ROC: 0.982-0.992). We also employed McNemar's method to identify statistically significant differences in model predictions. Our contribution is a reproducible framework for operational real-time hypoxia prediction that can support broader efforts in the environmental and ocean modeling systems community and in ecosystem resilience. The source code is available https://github.com/rmagesh148/hypoxia-ai/

Benchmarking Artificial Intelligence Models for Daily Coastal Hypoxia Forecasting

TL;DR

Abstract

Paper Structure (15 sections, 9 equations, 13 figures, 3 tables)

This paper contains 15 sections, 9 equations, 13 figures, 3 tables.

Introduction
Task Overview
Dataset and Data Preparation
The Data
Data Preparation Pipeline
Architectures
Methodology
Experimental Design and Validation Framework
Implementation Details
Experimental Results and Discussion
Classification Performance Analysis
Pairwise Model Comparison with McNemar's Test
Spatial Validation
Related Works
Conclusions

Figures (13)

Figure 1: The spatial domain of the hindcast model used in this study: the Louisiana-Texas shelf. This image shows the random samples of COAWST hindcast hypoxia (Blue) and normoxia (Red) for the month of August 2020.
Figure 2: Conceptual workflow for daily coastal hypoxia forecasting.
Figure 3: The data preparation workflow for our Deep Learning pipeline for time series for classification of dissolved oxygen in the Louisiana-Texas shelf.
Figure 4: ROC Curve of all 4 models. TCN (Top Left), Medformer (Top Right), STT (Bottom Left), BiLSTM (Bottom Right)
Figure 5: PR Curve of all 4 models. TCN (Top Left), Medformer (Top Right), STT (Bottom Left), BiLSTM (Bottom Right)
...and 8 more figures

Benchmarking Artificial Intelligence Models for Daily Coastal Hypoxia Forecasting

TL;DR

Abstract

Benchmarking Artificial Intelligence Models for Daily Coastal Hypoxia Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (13)