Table of Contents
Fetching ...

Aardvark weather: end-to-end data-driven weather forecasting

Anna Vaughan, Stratis Markou, Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, Richard E. Turner

TL;DR

Aardvark Weather introduces an end-to-end data-driven weather forecasting system that replaces the traditional numerical weather prediction pipeline with a neural-process-based model. It uses an encoder to map raw observations to a gridded initial state, a processor to generate autoregressive global forecasts, and a decoder to produce local station forecasts, trained with ERA5 data and capable of end-to-end fine-tuning for specific regions or variables. Global forecasts on a global grid outperform or closely match major operational baselines (GFS and HRES) for numerous variables and lead times, while local station forecasts remain skillful up to about 10 days and can surpass post-processed baselines in several regions. The approach is computationally lightweight and adaptable, enabling rapid creation of bespoke regional models and offering substantial potential for widening access to advanced forecasting, including in the developing world.

Abstract

Weather forecasting is critical for a range of human activities including transportation, agriculture, industry, as well as the safety of the general public. Machine learning models have the potential to transform the complex weather prediction pipeline, but current approaches still rely on numerical weather prediction (NWP) systems, limiting forecast speed and accuracy. Here we demonstrate that a machine learning model can replace the entire operational NWP pipeline. Aardvark Weather, an end-to-end data-driven weather prediction system, ingests raw observations and outputs global gridded forecasts and local station forecasts. Further, it can be optimised end-to-end to maximise performance over quantities of interest. Global forecasts outperform an operational NWP baseline for multiple variables and lead times. Local station forecasts are skillful up to ten days lead time and achieve comparable and often lower errors than a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. These forecasts are produced with a remarkably simple neural process model using just 8% of the input data and three orders of magnitude less compute than existing NWP and hybrid AI-NWP methods. We anticipate that Aardvark Weather will be the starting point for a new generation of end-to-end machine learning models for medium-range forecasting that will reduce computational costs by orders of magnitude and enable the rapid and cheap creation of bespoke models for users in a variety of fields, including for the developing world where state-of-the-art local models are not currently available.

Aardvark weather: end-to-end data-driven weather forecasting

TL;DR

Aardvark Weather introduces an end-to-end data-driven weather forecasting system that replaces the traditional numerical weather prediction pipeline with a neural-process-based model. It uses an encoder to map raw observations to a gridded initial state, a processor to generate autoregressive global forecasts, and a decoder to produce local station forecasts, trained with ERA5 data and capable of end-to-end fine-tuning for specific regions or variables. Global forecasts on a global grid outperform or closely match major operational baselines (GFS and HRES) for numerous variables and lead times, while local station forecasts remain skillful up to about 10 days and can surpass post-processed baselines in several regions. The approach is computationally lightweight and adaptable, enabling rapid creation of bespoke regional models and offering substantial potential for widening access to advanced forecasting, including in the developing world.

Abstract

Weather forecasting is critical for a range of human activities including transportation, agriculture, industry, as well as the safety of the general public. Machine learning models have the potential to transform the complex weather prediction pipeline, but current approaches still rely on numerical weather prediction (NWP) systems, limiting forecast speed and accuracy. Here we demonstrate that a machine learning model can replace the entire operational NWP pipeline. Aardvark Weather, an end-to-end data-driven weather prediction system, ingests raw observations and outputs global gridded forecasts and local station forecasts. Further, it can be optimised end-to-end to maximise performance over quantities of interest. Global forecasts outperform an operational NWP baseline for multiple variables and lead times. Local station forecasts are skillful up to ten days lead time and achieve comparable and often lower errors than a post-processed global NWP baseline and a state-of-the-art end-to-end forecasting system with input from human forecasters. These forecasts are produced with a remarkably simple neural process model using just 8% of the input data and three orders of magnitude less compute than existing NWP and hybrid AI-NWP methods. We anticipate that Aardvark Weather will be the starting point for a new generation of end-to-end machine learning models for medium-range forecasting that will reduce computational costs by orders of magnitude and enable the rapid and cheap creation of bespoke models for users in a variety of fields, including for the developing world where state-of-the-art local models are not currently available.
Paper Structure (7 sections, 11 equations, 30 figures)

This paper contains 7 sections, 11 equations, 30 figures.

Figures (30)

  • Figure 1: The weather prediction pipeline and Aardvark Weather. Top: Illustration of the conventional end-to-end weather prediction pipeline. First, in the atmospheric state estimation stage (turquoise), observational data from a range of sources are used to predict the atmospheric state for multiple variables (left column of globes). This is used as the initial condition for the forecasting stage, which predicts the atmospheric state at future lead times (right column of globes). Finally, the resulting predictions are post-processed using statistical methods or further local NWP models and used for downstream applications, e.g. generating local forecasts. Bottom: Illustration of the operation of Aardvark at deployment time. First, an encoder module uses raw observations as input to estimate the initial state of the atmosphere across key variables at $t = 0.$ Next, a processor module ingests this estimated state to produce a forecast at the next lead time $t = \delta t$. Forecasts at subsequent lead times are produced autoregressively. Finally, a decoder module is applied to the on-the-grid states to produce off-the-grid predictions. The modular design of Aardvark allows for pre-training on ERA5, a large, high-quality reanalysis dataset
  • Figure 2: Illustration of the different data sources leveraged in Aardvark. The input data to Aardvark consist of a combination of observations from remote sensing instruments (top row) which we pre-grid before passing to the model, as well as in-situ observations from land and marine observation platforms and radiosondes (bottom row). Each of these data modalities contain several observational variables, of which we select a subset here for the purposes of illustration. The remote sensing data also include a range of meta-data about the measurements, omitted here for simplicity. White areas indicate regions of missing data which must be handled by the encoder module
  • Figure 3: Gridded global forecast performance for selected variables. Latitude-weighted RMSE using ERA5 reanalysis data as the ground truth, on the held out test year (2018), for the four surface variables: 2-metre temperature (T2M), 10-metre eastward wind (U10), 10-metre northward wind (V10) and mean sea level pressure (MSLP) (first row) and four headline upper-atmosphere variables: temperatre at 850hPa (T850), eastward wind at 700hPa (U700) specific humidity at 700hPa (Q700) and geopotential at 500hPa (Z500) (second row), as a function of lead time $t.$ At lead time $t = 0,$ Aardvark predicts the initial atmospheric state from from observational data alone. The error at $t = 0$ corresponds to the error in the initial state. Note that HRES has non-zero error at $t = 0,$ as it is compared to ERA5 reanalysis ground truth. Results for the full set of variables predicted by Aardvark are given in appendix \ref{['app:forecast_full']}
  • Figure 4: Illustration of Aardvark's global gridded forecasts for 10-metre wind speed. Plots of the initial condition (first row) and subsequent forecasts (second, third and fourth rows) for 10-metre wind speed (U10), showing Aardvark's prediction (left), the ERA5 ground truth (middle), and the difference between the two (right). Lead time $t = 0$ corresponds 00:00 on the $11^{\text{th}}$ of January 2018. Aardvark correctly predicts large-scale features for this variable, and correctly predicts the formation and positioning of the tropical cyclone Berguitta (highlighted in the magenta boxes), which reached peak intensity on the $15^{th}$ of January 2018 off the coast of Madagascar. We emphasise that the model makes these predictions entirely from raw observations, without any NWP products as input. Appendix \ref{['app:plots']} gives example plots for all variables predicted by Aardvark
  • Figure 5: Station downscaling and end-to-end performance. Results for station forecasting (top) and end-to-end optimisation (bottom) for the held-out test set (2018) of HadISD data. Here, Aardvark makes predictions at spatial locations observed during training, on temporally held out data, however can generate predictions at any arbitrary station location. For station forecasting (top) we compare Aardvark's forecasts to two state-of-the-art NWP baselines, the National Digital Forecast Database (NDFD) for CONUS, and a version of HRES that we correct using a scale and bias term learned separately for each station (see text for discussion). In end-to-end fine-tuning (bottom), we compare the predictions of Aardvark for lead time $t = 1$ to those of its end-to-end fine-tuned counterpart for 2-metre temperature (T2M) and 10-metre wind speed (WS). We report the mean % improvement in each variable by region (see bottom left) with 95% confidence intervals. "Global" includes all stations (black and coloured)
  • ...and 25 more figures