Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves

Alessandro Lovo; Amaury Lancelin; Corentin Herbert; Freddy Bouchet

Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves

Alessandro Lovo, Amaury Lancelin, Corentin Herbert, Freddy Bouchet

TL;DR

This study tackles the accuracy-interpretability trade-off in predicting extreme heatwaves over France by evaluating a hierarchy of ML models from Gaussian Approximation to CNNs, including an Intrinsically Interpretable Neural Network (IINN) and a Scattering Transform-based ScatNet. It shows that with ample data, CNNs and ScatNet offer similar predictive skill, but ScatNet provides clearer, global interpretability and reveals sub-synoptic geopotential oscillations as additional predictive drivers; with limited data, the simple GA can outperform more complex models, underscoring data-scarcity constraints. Interpretability analyses reveal that GA and IINN yield transparent, physics-aligned input patterns, while CNN explanations are generally modest refinements on GA predictions, and ScatNet delivers a principled, scale-aware view of information content. Overall, ScatNet emerges as a promising, interpretable alternative to CNNs for climate predictions, capable of uncovering physically meaningful scales and orientations that influence extreme heatwave likelihood.

Abstract

When performing predictions that use Machine Learning (ML), we are mainly interested in performance and interpretability. This generates a natural trade-off, where complex models generally have higher skills but are harder to explain and thus trust. Interpretability is particularly important in the climate community, where we aim at gaining a physical understanding of the underlying phenomena. Even more so when the prediction concerns extreme weather events with high impact on society. In this paper, we perform probabilistic forecasts of extreme heatwaves over France, using a hierarchy of increasingly complex ML models, which allows us to find the best compromise between accuracy and interpretability. More precisely, we use models that range from a global Gaussian Approximation (GA) to deep Convolutional Neural Networks (CNNs), with the intermediate steps of a simple Intrinsically Interpretable Neural Network (IINN) and a model using the Scattering Transform (ScatNet). Our findings reveal that CNNs provide higher accuracy, but their black-box nature severely limits interpretability, even when using state-of-the-art Explainable Artificial Intelligence (XAI) tools. In contrast, ScatNet achieves similar performance to CNNs while providing greater transparency, identifying key scales and patterns in the data that drive predictions. This study underscores the potential of interpretability in ML models for climate science, demonstrating that simpler models can rival the performance of their more complex counterparts, all the while being much easier to understand. This gained interpretability is crucial for building trust in model predictions and uncovering new scientific insights, ultimately advancing our understanding and management of extreme weather events.

Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves

TL;DR

Abstract

Paper Structure (24 sections, 11 equations, 6 figures, 5 tables)

This paper contains 24 sections, 11 equations, 6 figures, 5 tables.

Introduction
Data and methods
Data
Heatwave amplitude
Predictors
Probabilistic regression
The model hierarchy
Gaussian approximation
Intrinsically Interpretable Neural Network
Scattering Network
Convolutional Neural Network
Hyperparameter optimization
Performance
Training on the full dataset
Training on a smaller dataset
...and 9 more sections

Figures (6)

Figure 1: Projection patterns (top) and projected space (bottom) for GA (left) and IINN (right). In the bottom plots, the black dots are the test data, the continuous line is the predicted $\hat{\mu}(X)$ and the shading corresponds to $\pm \hat{\sigma}(X)$. We show among the 5 models the one with the highest skill.
Figure 2: Top: average of the optimal inputs $S$ that maximize the heatwave amplitude $\hat{\mu}_\mathrm{CNN}(S)$ predicted by the CNN, across the different seeds $S_0$ taken from the test dataset. Bottom, signal-to-noise-ratio, computed as the pixelwise ratio between mean (top row) and standard deviation of the optimal inputs (see Fig. S11 of the Supplementary Materials).
Figure 3: Top: average of the optimal inputs $S$ that maximize the heatwave amplitude $\hat{\mu}_\mathrm{CNN}(S)$ predicted by the CNN while keeping fixed the prediction $\hat{\mu}_\mathrm{GA}(S)$ of the Gaussian approximation, across the different seeds $S_0$ taken from the test dataset. Bottom, signal-to-noise-ratio, computed as the pixelwise ratio between mean (top row) and standard deviation of the optimal inputs (see Fig. S12 of the Supplementary Materials).
Figure 4: Top row: Several normalized Z500 (no units) initial conditions $X$ associated with $A$ above the $99^{th}$ percentile (heatwaves). Second row: Expected Gradients feature importance (EGFI) of the CNN predictions on these inputs. Third row: pointwise multiplication between inputs and GA projection pattern. Since $\hat{\mu}_{GA}(X)$ is linear in $X$, this is equivalent to EGFI for the GA prediction. Fourth row: EGFI CNN minus EGFI GA for each input.
Figure 5: Projection patterns of the GA (left) and the feature importance of coarse Z500 field for the prediction of $\hat{\mu}(X)$ with ScatNet (right). We show, among the 5 models, the one with the highest skill.
...and 1 more figures

Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves

TL;DR

Abstract

Tackling the Accuracy-Interpretability Trade-off in a Hierarchy of Machine Learning Models for the Prediction of Extreme Heatwaves

Authors

TL;DR

Abstract

Table of Contents

Figures (6)