Table of Contents
Fetching ...

Statistical post-processing yields accurate probabilistic forecasts from Artificial Intelligence weather models

Belinda Trotta, Robert Johnson, Catherine de Burgh-Day, Debra Hudson, Esteban Abellan, James Canvin, Andrew Kelly, Daniel Mentiplay, Benjamin Owen, Jennifer Whelan

TL;DR

The paper tackles biases and reliability issues in AI-based weather forecasts by applying the Bureau of Meteorology's IMPROVER statistical post-processing to ECMWF's AI forecasting system (AIFS) and comparing results to traditional NWP post-processing for HRES and ENS. It demonstrates that IMPROVER can significantly improve both deterministic and probabilistic outputs for AIFS without modifying existing workflows, and shows that blending AI forecasts with NWP forecasts yields further gains in forecast skill. Key findings include comparable calibration and CRPS improvements for AIFS relative to NWP, and consistent benefits from including AIFS in blended forecasts. The study provides a practical pathway for national meteorological centers to integrate AI-based forecasts into current operational systems in a low-risk, incremental manner, enhancing overall forecasting robustness and utility.

Abstract

Artificial Intelligence (AI) weather models are now reaching operational-grade performance for some variables, but like traditional Numerical Weather Prediction (NWP) models, they exhibit systematic biases and reliability issues. We test the application of the Bureau of Meteorology's existing statistical post-processing system, IMPROVER, to ECMWF's deterministic Artificial Intelligence Forecasting System (AIFS), and compare results against post-processed outputs from the ECMWF HRES and ENS models. Without any modification to processing workflows, post-processing yields comparable accuracy improvements for AIFS as for traditional NWP forecasts, in both expected value and probabilistic outputs. We show that blending AIFS with NWP models improves overall forecast skill, even when AIFS alone is not the most accurate component. These findings show that statistical post-processing methods developed for NWP are directly applicable to AI models, enabling national meteorological centres to incorporate AI forecasts into existing workflows in a low-risk, incremental fashion.

Statistical post-processing yields accurate probabilistic forecasts from Artificial Intelligence weather models

TL;DR

The paper tackles biases and reliability issues in AI-based weather forecasts by applying the Bureau of Meteorology's IMPROVER statistical post-processing to ECMWF's AI forecasting system (AIFS) and comparing results to traditional NWP post-processing for HRES and ENS. It demonstrates that IMPROVER can significantly improve both deterministic and probabilistic outputs for AIFS without modifying existing workflows, and shows that blending AI forecasts with NWP forecasts yields further gains in forecast skill. Key findings include comparable calibration and CRPS improvements for AIFS relative to NWP, and consistent benefits from including AIFS in blended forecasts. The study provides a practical pathway for national meteorological centers to integrate AI-based forecasts into current operational systems in a low-risk, incremental manner, enhancing overall forecasting robustness and utility.

Abstract

Artificial Intelligence (AI) weather models are now reaching operational-grade performance for some variables, but like traditional Numerical Weather Prediction (NWP) models, they exhibit systematic biases and reliability issues. We test the application of the Bureau of Meteorology's existing statistical post-processing system, IMPROVER, to ECMWF's deterministic Artificial Intelligence Forecasting System (AIFS), and compare results against post-processed outputs from the ECMWF HRES and ENS models. Without any modification to processing workflows, post-processing yields comparable accuracy improvements for AIFS as for traditional NWP forecasts, in both expected value and probabilistic outputs. We show that blending AIFS with NWP models improves overall forecast skill, even when AIFS alone is not the most accurate component. These findings show that statistical post-processing methods developed for NWP are directly applicable to AI models, enabling national meteorological centres to incorporate AI forecasts into existing workflows in a low-risk, incremental fashion.

Paper Structure

This paper contains 7 sections, 1 equation, 16 figures, 6 tables.

Figures (16)

  • Figure 1: The Australian continent, showing locations of the Bureau of Meteorology's Automatic Weather Stations used for verification in this study.
  • Figure 2: The processing workflow. The flowchart on the left shows the main steps of the analysis. These steps are duplicated for each input model (ENS, HRES, and AIFS). Inputs and outputs of selected steps are shown in bold. The IMPROVER processing is expanded in more detail on the right.
  • Figure 3: Blending weights for the all-model blend (top row) and NWP-model blend (ENS and HRES only; bottom row). The dotted line shows the optimal weights per lead hour. The solid line shows a smoothed version where a piecewise-linear function is fitted to the optimal weights. These weights are calculated on the first half of the dataset, and are used to calculate the blended output for the second half.
  • Figure 4: Raw forecasts (top row) and post-processed expected value outputs from IMPROVER (bottom row) for temperature at lead time 18H and valid time 2024-06-15 06:00 UTC (16:00 Australian Eastern Standard Time). The left column is HRES and the right AIFS. Units are degrees Celsius. Note that the raw forecasts are on a latitude/longitude grid, while the post-processed forecasts use the Albers equal area projection. The diagonal artefacts near the edges in the post-processed forecasts occur because calibration is done against the MSAS analysis, which has a more limited spatial domain.
  • Figure 5: Mean squared error by lead day for raw (dashed line) and post-processed (solid line) models ENS (blue), HRES (red) and AIFS (green), for temperature (left), dew point (middle) and wind speed (right). The calculation includes only lead times that are present in both the raw and post-processed forecasts (that is, those that are multiples of 6 hours).
  • ...and 11 more figures