ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Juan Nathaniel; Yongquan Qu; Tung Nguyen; Sungduk Yu; Julius Busecke; Aditya Grover; Pierre Gentine

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Juan Nathaniel, Yongquan Qu, Tung Nguyen, Sungduk Yu, Julius Busecke, Aditya Grover, Pierre Gentine

Abstract

Accurate prediction of climate in the subseasonal-to-seasonal scale is crucial for disaster preparedness and robust decision making amidst climate change. Yet, forecasting beyond the weather timescale is challenging because it deals with problems other than initial condition, including boundary interaction, butterfly effect, and our inherent lack of physical understanding. At present, existing benchmarks tend to have shorter forecasting range of up-to 15 days, do not include a wide range of operational baselines, and lack physics-based constraints for explainability. Thus, we propose ChaosBench, a challenging benchmark to extend the predictability range of data-driven weather emulators to S2S timescale. First, ChaosBench is comprised of variables beyond the typical surface-atmospheric ERA5 to also include ocean, ice, and land reanalysis products that span over 45 years to allow for full Earth system emulation that respects boundary conditions. We also propose physics-based, in addition to deterministic and probabilistic metrics, to ensure a physically-consistent ensemble that accounts for butterfly effect. Furthermore, we evaluate on a diverse set of physics-based forecasts from four national weather agencies as baselines to our data-driven counterpart such as ViT/ClimaX, PanguWeather, GraphCast, and FourCastNetV2. Overall, we find methods originally developed for weather-scale applications fail on S2S task: their performance simply collapse to an unskilled climatology. Nonetheless, we outline and demonstrate several strategies that can extend the predictability range of existing weather emulators, including the use of ensembles, robust control of error propagation, and the use of physics-informed models. Our benchmark, datasets, and instructions are available at https://leap-stc.github.io/ChaosBench.

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Abstract

Paper Structure (55 sections, 29 equations, 36 figures, 10 tables)

This paper contains 55 sections, 29 equations, 36 figures, 10 tables.

Introduction
Related Work
ChaosBench
Observations
Simulations
Auxiliary
Benchmark Metrics
Deterministic Metrics
Physics Metrics
Probabilistic Metrics
Benchmark Results
Conclusion
Accountability and Reproducibility Statement
Getting Started
Data Preparation
...and 40 more sections

Figures (36)

Figure 1: We propose ChaosBench, a large-scale, fully-coupled, physics-based benchmark for subseasonal-to-seasonal (S2S) climate prediction. It is framed as a high-dimensional sequential regression task that consists of 45+ years, multi-system observations for validating physics-based and data-driven models, and training the latter. Physics-based forecasts are generated from four national weather agencies with 44-day lead-time and serve as baselines to data-driven forecasts. Our benchmark is one of the first to incorporate physics-based metrics to ensure physically-consistent and explainable models. The blurred image at $\Delta t=44$ represents a challenge of long-term forecasting.
Figure 2: Physics-based simulations that couple different parts of the Earth system along with their operational choices such as data assimilation. The brackets are the number of variables provided in ChaosBench.
Figure 3: Motivating problem: as we perform longer rollouts, the (a) residual error becomes larger and prediction becomes blurry. This behavior is captured in the Fourier frequency domain where the (b) power spectra $S(k)$ at low wavenumber $k$ (i.e., low frequency signal) remains consistent at long rollouts, but not for higher $k$ (i.e., high frequency signal). This phenomenon explains why long-term forecasts excel at capturing large-scale pattern but not fine-grained details i.e., smooth.
Figure 4: Evaluation results between baseline climatology (black line) and physics-based control/deterministic forecasts. At longer forecasting horizon, most physics-based control/deterministic forecasts perform worse than climatology.
Figure 5: Evaluation results between baseline climatology (black line) and data-driven models including PanguWeather (PW), GraphCast (GC), and FourCastNetV2 (FCN2). We find that deterministic ML models perform worse than climatology on S2S timescale. Note: FCN2 lacks q-700.
...and 31 more figures

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Abstract

ChaosBench: A Multi-Channel, Physics-Based Benchmark for Subseasonal-to-Seasonal Climate Prediction

Authors

Abstract

Table of Contents

Figures (36)