Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

Colin Aitken; Rajat Masiwal; Adam Marchakitus; Katherine Kowal; Mayank Gupta; Tyler Yang; Amir Jina; Pedram Hassanzadeh; William R. Boos; Michael Kremer

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

Colin Aitken, Rajat Masiwal, Adam Marchakitus, Katherine Kowal, Mayank Gupta, Tyler Yang, Amir Jina, Pedram Hassanzadeh, William R. Boos, Michael Kremer

TL;DR

A decision-theory framework for designing useful forecasts in settings where the forecaster cannot prescribe optimal actions because farmers'circumstances are heterogeneous and a system for tailoring forecasts to the requirements of this framework by blending systematically benchmarked artificial intelligence weather prediction models with a new statistical model.

Abstract

Hundreds of millions of farmers make high-stakes decisions under uncertainty about future weather. Forecasts can inform these decisions, but available choices and their risks and benefits vary between farmers. We introduce a decision-theory framework for designing useful forecasts in settings where the forecaster cannot prescribe optimal actions because farmers' circumstances are heterogeneous. We apply this framework to the case of seasonal onset of monsoon rains, a key date for planting decisions and agricultural investments in many tropical countries. We develop a system for tailoring forecasts to the requirements of this framework by blending systematically benchmarked artificial intelligence (AI) weather prediction models with a new "evolving farmer expectations" statistical model. This statistical model applies Bayesian inference to historical observations to predict time-varying probabilities of first-occurrence events throughout a season. The blended system yields more skillful Indian monsoon forecasts at longer lead times than its components or any multi-model average. In 2025, this system was deployed operationally in a government-led program that delivered subseasonal monsoon onset forecasts to 38 million Indian farmers, skillfully predicting that year's early-summer anomalous dry period. This decision-theory framework and blending system offer a pathway for developing climate adaptation tools for large vulnerable populations around the world.

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

TL;DR

Abstract

Paper Structure (20 sections, 6 theorems, 8 equations, 9 figures, 1 table)

This paper contains 20 sections, 6 theorems, 8 equations, 9 figures, 1 table.

Introduction
Decision-Theory Framework Informing Forecast Design
An Evolving-Expectations Model
Blending AI and Statistical Forecasts
Discussion
Methods
Acknowledgments
Author contributions
Data availability
Figures

Key Result

Proposition 1

Suppose forecasts are well-calibrated. Then each farmer is (weakly) better in expectation to if the forecaster provides them with the probabilities of each potential outcome rather than coarsening them to deterministic forecasts.

Figures (9)

Figure 1: Static climatology and the evolving-expectations model.A) Median onset date for the grid cells used in dissemination. B) The distribution of possible onset dates predicted by the evolving-expectations model for a grid cell centered at latitude 26$^\circ$N and longitude 78$^\circ$E. Shaded distribution shows the unconditional probability of onset; lines show the probability from the evolving expectations model if the onset has not occurred by the date signified by the dot. Illustrative probabilities are calculated over the shaded grey region. As the season progresses without an onset, the probability of onset in any particular future week increases. C) The distribution of possible onset dates predicted by the evolving-expectations model for a grid cell centered at 18$^\circ$N and 80$^\circ$E. The decision-theory framework implies that models that cannot outperform this baseline should not be used on their own for dissemination.
Figure 1: Skill scores aggregated by test period. The blended model is cross-validated during the 2000-2024 period, and trained on 2000-2024 data during the other periods. The climatology model and the evolving expectations model are cross-validated by year using 1900-2024 IMD data. Skill scores are all computed relative to static climatology.
Figure 2: Evaluation of temporal components of models' skill for the 2000-2024 period.A) Area under ROC curve (AUC) by lead time. The baseline for the bars is chosen to be 0.5, indicating the AUC of a forecast with no ability to distinguish onsets from non-onsets. B) Brier skill score by lead time, computed relative to a traditional (static) climatology model. C) AUC by year across all lead times. The 2000-2024 scores are computed via cross-validation. The 2025 scores are only for forecasts that were actually disseminated before Moron-Robertson onset in each grid cell. Dissemination began in late May, so the set of initialization dates is smaller than in other years. D) Brier skill score by year across all lead times, calculated relative to a traditional (static) climatology model. A deterministic version of AIFS was used in both the blended model and benchmarking, so NGCM is shown here as a representative AIWP model as it performs better on these probabilistic metrics (Fig. \ref{['fig:fig4']}). See Extended Data Fig. 2 for results for the 1965-1978 period.
Figure 2: Temporal components of model skill, 1965-1978.A) Area under ROC curve (AUC) by lead time during the 1965-1978 pre-satellite-era period. The baseline for the bars is chosen to be 0.5, indicating the AUC of a forecast with no ability to distinguish onsets from non-onsets. B) Brier skill score by lead time, computed relative to a traditional (static) climatology model. C) Area under ROC curve by year across all lead times. The 2000-2024 scores are computed via cross-validation. D) Brier skill score by year across all lead times, calculated relative to a traditional (static) climatology model.
Figure 3: Reliability diagram and histogram of probabilities assigned by different models during the 2000-2024 cross-validation period. If the model is well-calibrated the points should line up with the dashed line. Each point represents a decile of probabilities assigned by the model, and compares the average predicted probability in that decile and the fraction of events which occurred in observation. Histograms are normalized so that the bin capturing probabilities between 0 and 10% has height equal to one. A deterministic version of AIFS was used for both benchmarking and the blended model, so NGCM is used as a representative AIWP model for this figure. See Extended Data Fig. 3 for the results from the 1965-1978 period.
...and 4 more figures

Theorems & Definitions (6)

Proposition 1
Proposition 2
Proposition 3
Proposition 4
Proposition 5
Proposition 6

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

TL;DR

Abstract

Designing probabilistic AI monsoon forecasts to inform agricultural decision-making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (6)