Investigating the Duality of Interpretability and Explainability in Machine Learning
Moncef Garouani, Josiane Mothe, Ayah Barhrhouj, Julien Aligon
TL;DR
The paper tackles the interpretability-explainability duality in ML by arguing for inherently interpretable, hybrid models that embed symbolic domain knowledge during training. It surveys Explainable AI approaches and their limitations, then presents empirical evidence from two case studies showing that symbolic knowledge injection can sustain or improve accuracy under data scarcity and enhance reliability, as evidenced by SHAP analyses. The work highlights the complementary nature of explainability and interpretability, advocating a context-driven, hybrid strategy to foster trust and accountability in high-stakes applications. Overall, it suggests that integrating symbolic reasoning with neural predictors can bridge the gap between theoretical transparency and practical, trustworthy AI deployment.
Abstract
The rapid evolution of machine learning (ML) has led to the widespread adoption of complex "black box" models, such as deep neural networks and ensemble methods. These models exhibit exceptional predictive performance, making them invaluable for critical decision-making across diverse domains within society. However, their inherently opaque nature raises concerns about transparency and interpretability, making them untrustworthy decision support systems. To alleviate such a barrier to high-stakes adoption, research community focus has been on developing methods to explain black box models as a means to address the challenges they pose. Efforts are focused on explaining these models instead of developing ones that are inherently interpretable. Designing inherently interpretable models from the outset, however, can pave the path towards responsible and beneficial applications in the field of ML. In this position paper, we clarify the chasm between explaining black boxes and adopting inherently interpretable models. We emphasize the imperative need for model interpretability and, following the purpose of attaining better (i.e., more effective or efficient w.r.t. predictive performance) and trustworthy predictors, provide an experimental evaluation of latest hybrid learning methods that integrates symbolic knowledge into neural network predictors. We demonstrate how interpretable hybrid models could potentially supplant black box ones in different domains.
