Table of Contents
Fetching ...

Machine learning models for daily rainfall forecasting in Northern Tropical Africa using tropical wave predictors

Athul Rasheeda Satheesh, Peter Knippertz, Andreas H. Fink

TL;DR

This study tackles the underperformance of numerical weather prediction for daily rainfall in northern tropical Africa by leveraging machine learning models trained on tropical-wave predictors derived from GPM IMERG. It develops a predictor-selection pipeline and uses gamma regression and a 1D CNN, calibrated with EasyUQ, to generate probabilistic forecasts that outperform EPC15 and ECMWF ENS benchmarks. The results show downstream tropical-wave predictors, especially TD-type waves, as primary predictors, with CNN offering the strongest regional skill gains, notably in the Sahel and Congo Basin. The work demonstrates the practical potential of TW-based ML forecasts for operational rainfall prediction in tropical Africa and highlights pathways for integration with existing forecasting systems.

Abstract

Numerical weather prediction (NWP) models often underperform compared to simpler climatology-based precipitation forecasts in northern tropical Africa, even after statistical postprocessing. AI-based forecasting models show promise but have avoided precipitation due to its complexity. Synoptic-scale forcings like African easterly waves and other tropical waves (TWs) are important for predictability in tropical Africa, yet their value for predicting daily rainfall remains unexplored. This study uses two machine-learning models--gamma regression and a convolutional neural network (CNN)--trained on TW predictors from satellite-based GPM IMERG data to predict daily rainfall during the July-September monsoon season. Predictor variables are derived from the local amplitude and phase information of seven TW from the target and up-and-downstream neighboring grids at 1-degree spatial resolution. The ML models are combined with Easy Uncertainty Quantification (EasyUQ) to generate calibrated probabilistic forecasts and are compared with three benchmarks: Extended Probabilistic Climatology (EPC15), ECMWF operational ensemble forecast (ENS), and a probabilistic forecast from the ENS control member using EasyUQ (CTRL EasyUQ). The study finds that downstream predictor variables offer the highest predictability, with downstream tropical depression (TD)-type wave-based predictors being most important. Other waves like mixed-Rossby gravity (MRG), Kelvin, and inertio-gravity waves also contribute significantly but show regional preferences. ENS forecasts exhibit poor skill due to miscalibration. CTRL EasyUQ shows improvement over ENS and marginal enhancement over EPC15. Both gamma regression and CNN forecasts significantly outperform benchmarks in tropical Africa. This study highlights the potential of ML models trained on TW-based predictors to improve daily precipitation forecasts in tropical Africa.

Machine learning models for daily rainfall forecasting in Northern Tropical Africa using tropical wave predictors

TL;DR

This study tackles the underperformance of numerical weather prediction for daily rainfall in northern tropical Africa by leveraging machine learning models trained on tropical-wave predictors derived from GPM IMERG. It develops a predictor-selection pipeline and uses gamma regression and a 1D CNN, calibrated with EasyUQ, to generate probabilistic forecasts that outperform EPC15 and ECMWF ENS benchmarks. The results show downstream tropical-wave predictors, especially TD-type waves, as primary predictors, with CNN offering the strongest regional skill gains, notably in the Sahel and Congo Basin. The work demonstrates the practical potential of TW-based ML forecasts for operational rainfall prediction in tropical Africa and highlights pathways for integration with existing forecasting systems.

Abstract

Numerical weather prediction (NWP) models often underperform compared to simpler climatology-based precipitation forecasts in northern tropical Africa, even after statistical postprocessing. AI-based forecasting models show promise but have avoided precipitation due to its complexity. Synoptic-scale forcings like African easterly waves and other tropical waves (TWs) are important for predictability in tropical Africa, yet their value for predicting daily rainfall remains unexplored. This study uses two machine-learning models--gamma regression and a convolutional neural network (CNN)--trained on TW predictors from satellite-based GPM IMERG data to predict daily rainfall during the July-September monsoon season. Predictor variables are derived from the local amplitude and phase information of seven TW from the target and up-and-downstream neighboring grids at 1-degree spatial resolution. The ML models are combined with Easy Uncertainty Quantification (EasyUQ) to generate calibrated probabilistic forecasts and are compared with three benchmarks: Extended Probabilistic Climatology (EPC15), ECMWF operational ensemble forecast (ENS), and a probabilistic forecast from the ENS control member using EasyUQ (CTRL EasyUQ). The study finds that downstream predictor variables offer the highest predictability, with downstream tropical depression (TD)-type wave-based predictors being most important. Other waves like mixed-Rossby gravity (MRG), Kelvin, and inertio-gravity waves also contribute significantly but show regional preferences. ENS forecasts exhibit poor skill due to miscalibration. CTRL EasyUQ shows improvement over ENS and marginal enhancement over EPC15. Both gamma regression and CNN forecasts significantly outperform benchmarks in tropical Africa. This study highlights the potential of ML models trained on TW-based predictors to improve daily precipitation forecasts in tropical Africa.
Paper Structure (20 sections, 8 equations, 16 figures, 2 tables)

This paper contains 20 sections, 8 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 1: Geographical overview of the analysis domain. Shading indicates altitude in metres. The red dashed lines demarcate the nested core domain ($25^\circ \text{W}-35^\circ \text{E}$ in longitude and $0^\circ -18^\circ \text{N}$ in latitude) where the machine learning forecasts are issued. Modified from rasheeda2023sources.
  • Figure 2: Schematic illustration of the mechanism behind the predictor selection algorithm. DN (UN) refers to N grid points downstream (upstream) of the target grid point (T) in the direction of the considered wave's propagation. The top (bottom) sketch shows how downstream (upstream) predictors offer predictability (see Section \ref{['data_methods']}\ref{['grad_boost']} for details). Sketches in lower opacity indicate the future locations of the propagating rainfall system modulated by a TW.
  • Figure 3: Relative importance of predictor variables at the grid point nearest to Niamey ($13^\circ$N, $2^\circ$E) for every year from 2007-2019. 'Wave UN' ('Wave DN') refers to the wave predictor N grid points upstream (downstream) of the target in the direction of wave propagation. Just 'Wave' refers to the wave predictor at the target grid point. Shading indicates the relative predictor importance in percentage. 'UN' and 'DN' notations will be used throughout the text to refer to upstream and downstream predictors. Note that the sum of relative predictor importance from all 63 predictors is 100$\%$ every year. Only unhatched predictors are selected for training the forecast models.
  • Figure 4: Ranks of PWAs for a) TD, b) MRG, c) Kelvin, d) IG1, and e) EIG waves (lower ranks denote higher relative importance) identified solely from 3 grid points downstream of the target grid point (D3) within the analysis domain during JAS from 2007 to 2019. We shade ranks only up to 21 out of 63 to highlight the most important predictors. We exclude MJO and ER due to their limited relevance for daily rainfall in tropical Africa.
  • Figure 5: Illustration of CNN-based forecast of 24-hour rainfall accumulation at the grid point near Niamey ($13^\circ$ N, $2^\circ$ E) using TW-based predictors: a) A histogram depicting the distribution of observed (blue) and (deterministically) forecast (orange) rainfall; b) A scatter plot of observed (x-axis) and forecast (y-axis) rainfall with the Pearson correlation coefficient ($\rho$) between the two variables displayed in the title. The red dashed line along the diagonal denotes perfect correlation ($\rho=1$); c) A PIT histogram demonstrating the calibration of the CNN (probabilistic) forecast. The red dashed line represents a standard uniform distribution, serving as a reference for a perfectly calibrated forecast; d) A time series plot exhibiting observed (blue) and (deterministic) forecast (orange) rainfall for all JAS seasons from 2007 to 2019. The grey shading indicates the 5th to 95th percentile range of the CNN (probabilistic) forecast. Additionally, the mean CRPS of the probabilistic forecast and the Taylor Score of the deterministic forecast are also provided.
  • ...and 11 more figures