On the Relationship Between Interpretability and Explainability in Machine Learning

Benjamin Leblanc; Pascal Germain

On the Relationship Between Interpretability and Explainability in Machine Learning

Benjamin Leblanc, Pascal Germain

TL;DR

This paper reframes interpretability and explainability not as substitutes but as complementary perspectives in ML, particularly for high-stakes decisions. It defines the concepts, analyzes the flaws of explaining black-box models and of relying on interpretable predictors alone, and introduces the notion of Explained Interpretable Predictor to show how combining both approaches can mitigate key drawbacks. The authors argue that explanations are more trustworthy when grounded in interpretable predictors, reduce misalignment between explainer and explainee, and can improve robustness and fairness. The work advocates integrating interpretability and explainability in research and practice to enhance reliability, accountability, and governance of ML systems.

Abstract

Interpretability and explainability have gained more and more attention in the field of machine learning as they are crucial when it comes to high-stakes decisions and troubleshooting. Since both provide information about predictors and their decision process, they are often seen as two independent means for one single end. This view has led to a dichotomous literature: explainability techniques designed for complex black-box models, or interpretable approaches ignoring the many explainability tools. In this position paper, we challenge the common idea that interpretability and explainability are substitutes for one another by listing their principal shortcomings and discussing how both of them mitigate the drawbacks of the other. In doing so, we call for a new perspective on interpretability and explainability, and works targeting both topics simultaneously, leveraging each of their respective assets.

On the Relationship Between Interpretability and Explainability in Machine Learning

TL;DR

Abstract

Paper Structure (9 sections, 3 figures, 1 table)

This paper contains 9 sections, 3 figures, 1 table.

Introduction
Defining and Discussing the Concepts at Hand
Interpretability
Explainability
The relationship between interpretability and explainability
The Flaws in Explaining Black-Boxes
The Flaws of Interpretable Predictors
Explained Interpretable Predictor
Conclusion

Figures (3)

Figure 1: Borrowed from DBLP:conf/nips/AdebayoGMGHK18, adapted: two predictors were trained on the MNIST task: one with the regular labels and one with random labels. Both predictors provide similar saliency maps for similar inputs even though we know for sure the second one hasn't learned anything.
Figure 2: Anscombe's quartet a23325bf-6a64-3bba-a051-ebe1b2dce874.
Figure 3: Trends in interpretability and explainability (XAI) research popularity.

On the Relationship Between Interpretability and Explainability in Machine Learning

TL;DR

Abstract

On the Relationship Between Interpretability and Explainability in Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)