ML Interpretability: Simple Isn't Easy
Tim Räz
TL;DR
The paper argues that ML interpretability is not a single, monolithic notion but a graded property that depends on how a predictor f is represented and understood. It reframes interpretability as functional interpretability—understanding the input-output behavior of f—and analyzes four interpretable model families (linear models, CART, MARS, GAMs) to show how interpretability arises and evolves with generality. It identifies four dimensions that influence interpretability and reveals two distinct paradigms (linear and tree-based) plus a middle ground via MARS, with GAMs illustrating additive nonparametric components. The findings offer a nuanced framework for explaining predictor functions globally and have implications for xAI and black-box models, suggesting a combined approach that leverages both formal representations and visualization, while outlining directions to extend the analysis to other ML paradigms.
Abstract
The interpretability of ML models is important, but it is not clear what it amounts to. So far, most philosophers have discussed the lack of interpretability of black-box models such as neural networks, and methods such as explainable AI that aim to make these models more transparent. The goal of this paper is to clarify the nature of interpretability by focussing on the other end of the 'interpretability spectrum'. The reasons why some models, linear models and decision trees, are highly interpretable will be examined, and also how more general models, MARS and GAM, retain some degree of interpretability. I find that while there is heterogeneity in how we gain interpretability, what interpretability is in particular cases can be explicated in a clear manner.
