Table of Contents
Fetching ...

Explanation Beyond Intuition: A Testable Criterion for Inherent Explainability

Michael Merry, Pat Riddle, Jim Warren

TL;DR

The paper defines a formal, graph-based criterion for inherent explainability in AI by decomposing models into structure-local subgraphs annotated with verifiable hypotheses and evidence. It introduces leaf and composition annotations within a hierarchical framework that enforces both structural and compositional coverage, with two verification modes (analytical and empirical). Through the PREDICT Cox proportional hazards cardiovascular risk model, it demonstrates full coverage and verifiable explanations across multiple audiences, arguing that this approach enables regulatory-compliant, auditable explainability especially for critical domains. The work also analyzes limitations for very large dense networks and highlights the potential for hierarchical grouping to preserve explainability in high-dimensional settings. Overall, it provides a rigorous blueprint for moving beyond intuition toward auditable, multi-audience explanations in real-world applications.

Abstract

Inherent explainability is the gold standard in Explainable Artificial Intelligence (XAI). However, there is not a consistent definition or test to demonstrate inherent explainability. Work to date either characterises explainability through metrics, or appeals to intuition - "we know it when we see it". We propose a globally applicable criterion for inherent explainability. The criterion uses graph theory for representing and decomposing models for structure-local explanation, and recomposing them into global explanations. We form the structure-local explanations as annotations, a verifiable hypothesis-evidence structure that allows for a range of explanatory methods to be used. This criterion matches existing intuitions on inherent explainability, and provides justifications why a large regression model may not be explainable but a sparse neural network could be. We differentiate explainable -- a model that allows for explanation -- and \textit{explained} -- one that has a verified explanation. Finally, we provide a full explanation of PREDICT -- a Cox proportional hazards model of cardiovascular disease risk, which is in active clinical use in New Zealand. It follows that PREDICT is inherently explainable. This work provides structure to formalise other work on explainability, and allows regulators a flexible but rigorous test that can be used in compliance frameworks.

Explanation Beyond Intuition: A Testable Criterion for Inherent Explainability

TL;DR

The paper defines a formal, graph-based criterion for inherent explainability in AI by decomposing models into structure-local subgraphs annotated with verifiable hypotheses and evidence. It introduces leaf and composition annotations within a hierarchical framework that enforces both structural and compositional coverage, with two verification modes (analytical and empirical). Through the PREDICT Cox proportional hazards cardiovascular risk model, it demonstrates full coverage and verifiable explanations across multiple audiences, arguing that this approach enables regulatory-compliant, auditable explainability especially for critical domains. The work also analyzes limitations for very large dense networks and highlights the potential for hierarchical grouping to preserve explainability in high-dimensional settings. Overall, it provides a rigorous blueprint for moving beyond intuition toward auditable, multi-audience explanations in real-world applications.

Abstract

Inherent explainability is the gold standard in Explainable Artificial Intelligence (XAI). However, there is not a consistent definition or test to demonstrate inherent explainability. Work to date either characterises explainability through metrics, or appeals to intuition - "we know it when we see it". We propose a globally applicable criterion for inherent explainability. The criterion uses graph theory for representing and decomposing models for structure-local explanation, and recomposing them into global explanations. We form the structure-local explanations as annotations, a verifiable hypothesis-evidence structure that allows for a range of explanatory methods to be used. This criterion matches existing intuitions on inherent explainability, and provides justifications why a large regression model may not be explainable but a sparse neural network could be. We differentiate explainable -- a model that allows for explanation -- and \textit{explained} -- one that has a verified explanation. Finally, we provide a full explanation of PREDICT -- a Cox proportional hazards model of cardiovascular disease risk, which is in active clinical use in New Zealand. It follows that PREDICT is inherently explainable. This work provides structure to formalise other work on explainability, and allows regulators a flexible but rigorous test that can be used in compliance frameworks.

Paper Structure

This paper contains 91 sections, 7 equations, 1 figure.

Figures (1)

  • Figure 1: Overview of the PREDICT cardiovascular risk model explanation structure, showing the hierarchical organization of leaf annotations, composition annotations, and global composition.

Theorems & Definitions (14)

  • Definition 1: Explanation
  • Definition 2: Explainable
  • Definition 3: Verifiability
  • Definition 4: Hypothesis-Evidence Structure
  • Definition 5: Computational Graph
  • Definition 6: Annotation
  • Definition 7: Leaf Annotation
  • Definition 8: Composition Annotation
  • Definition 9: Annotation Hierarchy
  • Definition 10: Structural Coverage
  • ...and 4 more