Challenges in interpretability of additive models
Xinyu Zhang, Julien Martinelli, ST John
TL;DR
This paper surveys generalized additive models and neural additive models as transparent, interpretable alternatives in machine learning, while arguing that interpretability is not guaranteed. It reviews the formal GAM/NAM framework, learning strategies for shape functions, and common interpretability metrics, and it discusses extensions such as higher-order interactions and uncertainty estimation. A central focus is nonidentifiability: sum-only observations and concurvity lead to indeterminacy and multiple plausible explanations (Rashomon effects), challenging straightforward interpretation. The authors advocate caution in claims of interpretability or safety-critical suitability, and propose embracing ensemble Rashomon-set approaches and domain-guided criteria to convey multiple plausible explanations rather than a single definitive one.
Abstract
We review generalized additive models as a type of ``transparent'' model that has recently seen renewed interest in the deep learning community as neural additive models. We highlight multiple types of nonidentifiability in this model class and discuss challenges in interpretability, arguing for restraint when claiming ``interpretability'' or ``suitability for safety-critical applications'' of such models.
