Table of Contents
Fetching ...

SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

Soukayna Mouatadid, Paulo Orenstein, Genevieve Flaspohler, Miruna Oprescu, Judah Cohen, Franklyn Wang, Sean Knight, Maria Geogdzhayeva, Sam Levang, Ernest Fraenkel, Lester Mackey

TL;DR

Subseasonal forecasting (2–6 weeks ahead) remains challenging due to limited dynamical model skill and heterogeneous drivers. The authors introduce SubseasonalClimateUSA, a curated, regularly updated dataset combining ground-truth observations, subseasonal drivers, and multi-model dynamical forecasts for the contiguous U.S., with a Python interface for easy access. They benchmark a wide suite of models, including simple baselines, adaptive bias-correction (ABC) hybrids, and diverse ML/DL forecasters, across four canonical subseasonal tasks (temperature and precipitation for weeks 3–4 and 5–6). The results show that simple ABC-based corrections to operational models often yield the best RMSE and skill, with ensembles providing further gains, and that the dataset can both accelerate model development and serve as standardized benchmarks for progress in subseasonal forecasting.

Abstract

Subseasonal forecasting of the weather two to six weeks in advance is critical for resource allocation and advance disaster notice but poses many challenges for the forecasting community. At this forecast horizon, physics-based dynamical models have limited skill, and the targets for prediction depend in a complex manner on both local weather variables and global climate variables. Recently, machine learning methods have shown promise in advancing the state of the art but only at the cost of complex data curation, integrating expert knowledge with aggregation across multiple relevant data sources, file formats, and temporal and spatial resolutions. To streamline this process and accelerate future development, we introduce SubseasonalClimateUSA, a curated dataset for training and benchmarking subseasonal forecasting models in the United States. We use this dataset to benchmark a diverse suite of models, including operational dynamical models, classical meteorological baselines, and ten state-of-the-art machine learning and deep learning-based methods from the literature. Overall, our benchmarks suggest simple and effective ways to extend the accuracy of current operational models. SubseasonalClimateUSA is regularly updated and accessible via the https://github.com/microsoft/subseasonal_data/ Python package.

SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking

TL;DR

Subseasonal forecasting (2–6 weeks ahead) remains challenging due to limited dynamical model skill and heterogeneous drivers. The authors introduce SubseasonalClimateUSA, a curated, regularly updated dataset combining ground-truth observations, subseasonal drivers, and multi-model dynamical forecasts for the contiguous U.S., with a Python interface for easy access. They benchmark a wide suite of models, including simple baselines, adaptive bias-correction (ABC) hybrids, and diverse ML/DL forecasters, across four canonical subseasonal tasks (temperature and precipitation for weeks 3–4 and 5–6). The results show that simple ABC-based corrections to operational models often yield the best RMSE and skill, with ensembles providing further gains, and that the dataset can both accelerate model development and serve as standardized benchmarks for progress in subseasonal forecasting.

Abstract

Subseasonal forecasting of the weather two to six weeks in advance is critical for resource allocation and advance disaster notice but poses many challenges for the forecasting community. At this forecast horizon, physics-based dynamical models have limited skill, and the targets for prediction depend in a complex manner on both local weather variables and global climate variables. Recently, machine learning methods have shown promise in advancing the state of the art but only at the cost of complex data curation, integrating expert knowledge with aggregation across multiple relevant data sources, file formats, and temporal and spatial resolutions. To streamline this process and accelerate future development, we introduce SubseasonalClimateUSA, a curated dataset for training and benchmarking subseasonal forecasting models in the United States. We use this dataset to benchmark a diverse suite of models, including operational dynamical models, classical meteorological baselines, and ten state-of-the-art machine learning and deep learning-based methods from the literature. Overall, our benchmarks suggest simple and effective ways to extend the accuracy of current operational models. SubseasonalClimateUSA is regularly updated and accessible via the https://github.com/microsoft/subseasonal_data/ Python package.

Paper Structure

This paper contains 48 sections, 2 equations, 21 figures, 8 tables, 3 algorithms.

Figures (21)

  • Figure 1: Schematic of the SubseasonalClimateUSA data collection and processing pipeline.
  • Figure 2: Example of SubseasonalClimateUSA observations and dynamical model forecasts.
  • Figure 3: Per season and per year average skill and improvement over mean debiased CFSv2 RMSE across the contiguous U.S. and the years 2011--2020. Despite their simplicity, the ABC models (solid lines) consistently outperform debiased CFSv2 and the state-of-the-art learners (dotted lines).
  • Figure 4: Percentage improvement over mean debiased CFSv2 RMSE in the contiguous U.S. over 2011--2020. White grid points indicate negative or 0% improvement.
  • Figure 5: Climatology++ hyperparameters automatically selected for each target date in 2011--2020.
  • ...and 16 more figures