Table of Contents
Fetching ...

OceanForecastBench: A Benchmark Dataset for Data-Driven Global Ocean Forecasting

Haoming Jia, Yi Han, Xiang Wang, Huizan Wang, Wei Wu, Jianming Zheng, Peikun Xiao

TL;DR

OceanForecastBench tackles the lack of open benchmarks for data-driven global ocean forecasting by providing a standardized, multi-source training dataset and an evaluation dataset built from GLORYS12, ERA5, and OSTIA, augmented with robust evaluation data from EN4, GDP, and CMEMS L3. It formalizes a multivariate forecasting task, introduces an end-to-end evaluation pipeline, and benchmarks five baselines (PSY4, ResNet, SwinTransformer, ClimaX, FourCastNet) across 1–10 day forecasts. Key findings reveal that data-driven methods outperform physics-based PSY4 in longer horizons for currents, while PSY4 remains strong for SST and near-term temperature, with notable challenges in velocity forecasts due to data sparsity. The benchmark enhances reproducibility and cross-disciplinary collaboration by delivering open-source data and tooling to support fair comparisons and future methodological improvements in ocean state forecasting.

Abstract

Global ocean forecasting aims to predict key ocean variables such as temperature, salinity, and currents, which is essential for understanding and describing oceanic phenomena. In recent years, data-driven deep learning-based ocean forecast models, such as XiHe, WenHai, LangYa and AI-GOMS, have demonstrated significant potential in capturing complex ocean dynamics and improving forecasting efficiency. Despite these advancements, the absence of open-source, standardized benchmarks has led to inconsistent data usage and evaluation methods. This gap hinders efficient model development, impedes fair performance comparison, and constrains interdisciplinary collaboration. To address this challenge, we propose OceanForecastBench, a benchmark offering three core contributions: (1) A high-quality global ocean reanalysis data over 28 years for model training, including 4 ocean variables across 23 depth levels and 4 sea surface variables. (2) A high-reliability satellite and in-situ observations for model evaluation, covering approximately 100 million locations in the global ocean. (3) An evaluation pipeline and a comprehensive benchmark with 6 typical baseline models, leveraging observations to evaluate model performance from multiple perspectives. OceanForecastBench represents the most comprehensive benchmarking framework currently available for data-driven ocean forecasting, offering an open-source platform for model development, evaluation, and comparison. The dataset and code are publicly available at: https://github.com/Ocean-Intelligent-Forecasting/OceanForecastBench.

OceanForecastBench: A Benchmark Dataset for Data-Driven Global Ocean Forecasting

TL;DR

OceanForecastBench tackles the lack of open benchmarks for data-driven global ocean forecasting by providing a standardized, multi-source training dataset and an evaluation dataset built from GLORYS12, ERA5, and OSTIA, augmented with robust evaluation data from EN4, GDP, and CMEMS L3. It formalizes a multivariate forecasting task, introduces an end-to-end evaluation pipeline, and benchmarks five baselines (PSY4, ResNet, SwinTransformer, ClimaX, FourCastNet) across 1–10 day forecasts. Key findings reveal that data-driven methods outperform physics-based PSY4 in longer horizons for currents, while PSY4 remains strong for SST and near-term temperature, with notable challenges in velocity forecasts due to data sparsity. The benchmark enhances reproducibility and cross-disciplinary collaboration by delivering open-source data and tooling to support fair comparisons and future methodological improvements in ocean state forecasting.

Abstract

Global ocean forecasting aims to predict key ocean variables such as temperature, salinity, and currents, which is essential for understanding and describing oceanic phenomena. In recent years, data-driven deep learning-based ocean forecast models, such as XiHe, WenHai, LangYa and AI-GOMS, have demonstrated significant potential in capturing complex ocean dynamics and improving forecasting efficiency. Despite these advancements, the absence of open-source, standardized benchmarks has led to inconsistent data usage and evaluation methods. This gap hinders efficient model development, impedes fair performance comparison, and constrains interdisciplinary collaboration. To address this challenge, we propose OceanForecastBench, a benchmark offering three core contributions: (1) A high-quality global ocean reanalysis data over 28 years for model training, including 4 ocean variables across 23 depth levels and 4 sea surface variables. (2) A high-reliability satellite and in-situ observations for model evaluation, covering approximately 100 million locations in the global ocean. (3) An evaluation pipeline and a comprehensive benchmark with 6 typical baseline models, leveraging observations to evaluate model performance from multiple perspectives. OceanForecastBench represents the most comprehensive benchmarking framework currently available for data-driven ocean forecasting, offering an open-source platform for model development, evaluation, and comparison. The dataset and code are publicly available at: https://github.com/Ocean-Intelligent-Forecasting/OceanForecastBench.

Paper Structure

This paper contains 13 sections, 6 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Schematic of the OceanForecastBench data collection and processing pipeline.
  • Figure 2: The RMSE (lower is better), ACC (higher is better), and CSS (higher is better) of the temperature, salinity, SST, SLA, Uo and Vo for various baselines, over a forecast period ranging from 1 to 10 days. RMSE, ACC, and CSS are computed using observations provided by OceanForecastBench.
  • Figure 3: Forecast accuracy against temperature observations taken by EN4. (a) RMSE/Bias as a function of forecast lead time. The boxes are the interquartile range and the 75th percentile. (b) RMSE/Bias of 10-day forecasts as a function of date. (c) Similar to (a) with ACC as the metric. (d) Similar to (b) with ACC as the metric.
  • Figure 4: Forecast accuracy against salinity observations taken by EN4. (a) RMSE/Bias as a function of forecast lead time. The boxes are the interquartile range and the 75th percentile. (b) RMSE/Bias of 10-day forecasts as a function of date. (c) Similar to (a) with ACC as the metric. (d) Similar to (b) with ACC as the metric.
  • Figure 5: Forecast accuracy against SST observations taken by GDP. (a) RMSE/Bias as a function of forecast lead time. The boxes are the interquartile range and the 75th percentile. (b) RMSE/Bias of 10-day forecasts as a function of date. (c) Similar to (a) with ACC as the metric. (d) Similar to (b) with ACC as the metric.
  • ...and 7 more figures