Statistical inference using machine learning and classical techniques based on accumulated local effects (ALE)
Chitu Okoli
TL;DR
This work advances $ALE$ as a robust, model-agnostic framework for global ML explanations by addressing reliability on small datasets, introducing interpretable $ALE$-based effect sizes ($ALER$, $ALED$, $NALER$, $NALED$), and establishing bootstrapped confidence regions for inference; it also prescribes full-model bootstrapping to mitigate overfitting in small samples and demonstrates these methods on a large (diamonds) and a small (math achievement) dataset, with implementations in the $ale$ package for R. The contributions enable reliable, nuanced conclusions about predictor effects across the entire domain, balancing effect size summaries with domain-specific confidence regions that reveal heterogeneous patterns. The practical impact lies in providing researchers and practitioners with scalable, interpretable tools for statistical inference in ML contexts, including clear guidance on when effects are practically meaningful beyond statistical significance. Together, these advances deepen the interpretability and trustworthiness of model explanations in diverse applied settings, particularly where data are limited or effects are highly non-linear.
Abstract
Accumulated Local Effects (ALE) is a model-agnostic approach for global explanations of the results of black-box machine learning (ML) algorithms. There are at least three challenges with conducting statistical inference based on ALE: ensuring the reliability of ALE analyses, especially in the context of small datasets; intuitively characterizing a variable's overall effect in ML; and making robust inferences from ML data analysis. In response, we introduce innovative tools and techniques for statistical inference using ALE, establishing bootstrapped confidence intervals tailored to dataset size and introducing ALE effect size measures that intuitively indicate effects on both the outcome variable scale and a normalized scale. Furthermore, we demonstrate how to use these tools to draw reliable statistical inferences, reflecting the flexible patterns ALE adeptly highlights, with implementations available in the 'ale' package in R. This work propels the discourse on ALE and its applicability in ML and statistical analysis forward, offering practical solutions to prevailing challenges in the field.
