On the Relationship Between Interpretability and Explainability in Machine Learning
Benjamin Leblanc, Pascal Germain
TL;DR
This paper reframes interpretability and explainability not as substitutes but as complementary perspectives in ML, particularly for high-stakes decisions. It defines the concepts, analyzes the flaws of explaining black-box models and of relying on interpretable predictors alone, and introduces the notion of Explained Interpretable Predictor to show how combining both approaches can mitigate key drawbacks. The authors argue that explanations are more trustworthy when grounded in interpretable predictors, reduce misalignment between explainer and explainee, and can improve robustness and fairness. The work advocates integrating interpretability and explainability in research and practice to enhance reliability, accountability, and governance of ML systems.
Abstract
Interpretability and explainability have gained more and more attention in the field of machine learning as they are crucial when it comes to high-stakes decisions and troubleshooting. Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end. This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools. In this position paper, we challenge the common idea that interpretability and explainability are substitutes for one another by listing their principal shortcomings and discussing how both of them mitigate the drawbacks of the other. In doing so, we call for a new perspective on interpretability and explainability, and works targeting both topics simultaneously, leveraging each of their respective assets.
