Table of Contents
Fetching ...

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

Vinitra Swamy, Jibril Frej, Tanja Käser

TL;DR

Explainable AI in human-centric domains is hampered by reliance on post-hoc explanations that can be unfaithful, inconsistent, and difficult to evaluate. The authors argue for intrinsically interpretable neural networks and propose two schemes, InterpretCC and I2MD, to build interpretable-by-design workflows—adaptive routing with interpretable conditional computation and iterative model diagnostics via knowledge-graph comparisons across training. They articulate five needs for human-centric XAI (real-time, accurate, actionable, human-interpretable, consistent) and demonstrate two concrete designs to meet them, emphasizing real-time, consistency, and developer-actionability. This shift has the potential to improve trust and adoption of DL in critical domains by guaranteeing interpretability within the modeling process rather than relying on post-hoc explanations.

Abstract

Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single post-hoc explainer, whereas recent work has identified systematic disagreement between post-hoc explainers when applied to the same instances of underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose a shift from post-hoc explainability to designing interpretable neural network architectures. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing with InterpretCC and temporal diagnostics with I2MD). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

TL;DR

Explainable AI in human-centric domains is hampered by reliance on post-hoc explanations that can be unfaithful, inconsistent, and difficult to evaluate. The authors argue for intrinsically interpretable neural networks and propose two schemes, InterpretCC and I2MD, to build interpretable-by-design workflows—adaptive routing with interpretable conditional computation and iterative model diagnostics via knowledge-graph comparisons across training. They articulate five needs for human-centric XAI (real-time, accurate, actionable, human-interpretable, consistent) and demonstrate two concrete designs to meet them, emphasizing real-time, consistency, and developer-actionability. This shift has the potential to improve trust and adoption of DL in critical domains by guaranteeing interpretability within the modeling process rather than relying on post-hoc explanations.

Abstract

Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single post-hoc explainer, whereas recent work has identified systematic disagreement between post-hoc explainers when applied to the same instances of underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose a shift from post-hoc explainability to designing interpretable neural network architectures. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing with InterpretCC and temporal diagnostics with I2MD). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
Paper Structure (7 sections, 2 figures)

This paper contains 7 sections, 2 figures.

Figures (2)

  • Figure 1: Explainability can be intrinsic (by design), in-hoc (e.g., gradient methods), or post-hoc (e.g., LIME, SHAP). Furthermore, the granularity of model explanations ranges from local (single user, a group of users) to global (entire sample).
  • Figure 2: Proposed architecture of adaptive routing with Intepretable Conditional Computation (InterpretCC, left). A discriminator layer adaptively selects feature groupings as important, then sends truncated feature sets to expert sub-networks. Example of global model benchmarks with Interpretable Iterative Model Diagnostics (I2MD, right). Knowledge graphs are extracted from a language model at iterative stages of training and compared over time with diagnostic benchmarks.