Table of Contents
Fetching ...

Machine Learning for Stochastic Parametrisation

Hannah M. Christensen, Salah Kouhen, Greta Miller, Raghul Parthipan

TL;DR

This position paper discusses the potential for data-driven approaches for stochastic parametrisation in weather and climate prediction, and highlights early studies in this area, and draws attention to the novel challenges that remain.

Abstract

Atmospheric models used for weather and climate prediction are traditionally formulated in a deterministic manner. In other words, given a particular state of the resolved scale variables, the most likely forcing from the sub-grid scale processes is estimated and used to predict the evolution of the large-scale flow. However, the lack of scale-separation in the atmosphere means that this approach is a large source of error in forecasts. Over recent years, an alternative paradigm has developed: the use of stochastic techniques to characterise uncertainty in small-scale processes. These techniques are now widely used across weather, sub-seasonal, seasonal, and climate timescales. In parallel, recent years have also seen significant progress in replacing parametrisation schemes using machine learning (ML). This has the potential to both speed up and improve our numerical models. However, the focus to date has largely been on deterministic approaches. In this position paper, we bring together these two key developments, and discuss the potential for data-driven approaches for stochastic parametrisation. We highlight early studies in this area, and draw attention to the novel challenges that remain.

Machine Learning for Stochastic Parametrisation

TL;DR

This position paper discusses the potential for data-driven approaches for stochastic parametrisation in weather and climate prediction, and highlights early studies in this area, and draws attention to the novel challenges that remain.

Abstract

Atmospheric models used for weather and climate prediction are traditionally formulated in a deterministic manner. In other words, given a particular state of the resolved scale variables, the most likely forcing from the sub-grid scale processes is estimated and used to predict the evolution of the large-scale flow. However, the lack of scale-separation in the atmosphere means that this approach is a large source of error in forecasts. Over recent years, an alternative paradigm has developed: the use of stochastic techniques to characterise uncertainty in small-scale processes. These techniques are now widely used across weather, sub-seasonal, seasonal, and climate timescales. In parallel, recent years have also seen significant progress in replacing parametrisation schemes using machine learning (ML). This has the potential to both speed up and improve our numerical models. However, the focus to date has largely been on deterministic approaches. In this position paper, we bring together these two key developments, and discuss the potential for data-driven approaches for stochastic parametrisation. We highlight early studies in this area, and draw attention to the novel challenges that remain.
Paper Structure (16 sections, 4 figures)

This paper contains 16 sections, 4 figures.

Figures (4)

  • Figure 1: Coarse-graining studies provide evidence for stochastic parametrisations. (a), the pdf of 'true' sub-grid temperature tendencies derived from a high-resolution simulation is conditioned on the tendency predicted by a deterministic forecast model ($T_{fc}$: colours in legend). (b) Mean 'true' tendency conditioned on $T_{fc}$. For this forecast model, positive temperature tendencies are well calibrated, while negative temperature tendencies are biased cold. (c) Standard deviation of 'true' tendency conditioned on $T_{fc}$. For this forecast model, the uncertainty in the 'true' tendency increases with the magnitude of the low-resolution forecast tendency. Figure adapted from christensen2020.
  • Figure 2: Reliability curve for convection occurrence estimated by the random forest (green line), which is close to perfect reliability (grey line). The random forest was developed for use as a stochastic convection trigger function. The circle sizes are proportional to the log of the number of samples per bin; there are many more non-convection events (91%) than convection events (9%). Figure adapted from miller2024.
  • Figure 3: a. The classic cellular automata, the game of life, after 70 rule iterations on random initial conditions. b. A set of rules discovered through the use of a genetic algorithm after 70 iterations from a random initial condition. c. An example of fitness convergence for a genetic algorithm scheme.
  • Figure 4: Cloud fractions as a function of height (model levels) for forecasts of 200 hours. Observed cloud fraction is compared to that from the operational deterministic parametrisation, and to two stochastic ML models. The Baseline ML model is a simple feed-forward neural network, whilst the Mixed ML model separates the task of modelling into a binary categorisation and continuous prediction problem. These are probabilistic models, and three sampled trajectories are shown for both. The mixed model is better able to create and remove cloud. Adapted from parthipan_thesis.