OceanForecastBench: A Benchmark Dataset for Data-Driven Global Ocean Forecasting
Haoming Jia, Yi Han, Xiang Wang, Huizan Wang, Wei Wu, Jianming Zheng, Peikun Xiao
TL;DR
OceanForecastBench tackles the lack of open benchmarks for data-driven global ocean forecasting by providing a standardized, multi-source training dataset and an evaluation dataset built from GLORYS12, ERA5, and OSTIA, augmented with robust evaluation data from EN4, GDP, and CMEMS L3. It formalizes a multivariate forecasting task, introduces an end-to-end evaluation pipeline, and benchmarks five baselines (PSY4, ResNet, SwinTransformer, ClimaX, FourCastNet) across 1–10 day forecasts. Key findings reveal that data-driven methods outperform physics-based PSY4 in longer horizons for currents, while PSY4 remains strong for SST and near-term temperature, with notable challenges in velocity forecasts due to data sparsity. The benchmark enhances reproducibility and cross-disciplinary collaboration by delivering open-source data and tooling to support fair comparisons and future methodological improvements in ocean state forecasting.
Abstract
Global ocean forecasting aims to predict key ocean variables such as temperature, salinity, and currents, which is essential for understanding and describing oceanic phenomena. In recent years, data-driven deep learning-based ocean forecast models, such as XiHe, WenHai, LangYa and AI-GOMS, have demonstrated significant potential in capturing complex ocean dynamics and improving forecasting efficiency. Despite these advancements, the absence of open-source, standardized benchmarks has led to inconsistent data usage and evaluation methods. This gap hinders efficient model development, impedes fair performance comparison, and constrains interdisciplinary collaboration. To address this challenge, we propose OceanForecastBench, a benchmark offering three core contributions: (1) A high-quality global ocean reanalysis data over 28 years for model training, including 4 ocean variables across 23 depth levels and 4 sea surface variables. (2) A high-reliability satellite and in-situ observations for model evaluation, covering approximately 100 million locations in the global ocean. (3) An evaluation pipeline and a comprehensive benchmark with 6 typical baseline models, leveraging observations to evaluate model performance from multiple perspectives. OceanForecastBench represents the most comprehensive benchmarking framework currently available for data-driven ocean forecasting, offering an open-source platform for model development, evaluation, and comparison. The dataset and code are publicly available at: https://github.com/Ocean-Intelligent-Forecasting/OceanForecastBench.
