Table of Contents
Fetching ...

Towards Compositional Interpretability for XAI

Sean Tull, Robin Lorenz, Stephen Clark, Ilyas Khan, Bob Coecke

TL;DR

An approach to defining AI models and their interpretability based on category theory is presented, using the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation.

Abstract

Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

Towards Compositional Interpretability for XAI

TL;DR

An approach to defining AI models and their interpretability based on category theory is presented, using the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation.

Abstract

Artificial intelligence (AI) is currently based largely on black-box machine learning models which lack interpretability. The field of eXplainable AI (XAI) strives to address this major concern, being critical in high-stakes areas such as the finance, legal and health sectors. We present an approach to defining AI models and their interpretability based on category theory. For this we employ the notion of a compositional model, which sees a model in terms of formal string diagrams which capture its abstract structure together with its concrete implementation. This comprehensive view incorporates deterministic, probabilistic and quantum models. We compare a wide range of AI models as compositional models, including linear and rule-based models, (recurrent) neural networks, transformers, VAEs, and causal and DisCoCirc models. Next we give a definition of interpretation of a model in terms of its compositional structure, demonstrating how to analyse the interpretability of a model, and using this to clarify common themes in XAI. We find that what makes the standard 'intrinsically interpretable' models so transparent is brought out most clearly diagrammatically. This leads us to the more general notion of compositionally-interpretable (CI) models, which additionally include, for instance, causal, conceptual space, and DisCoCirc models. We next demonstrate the explainability benefits of CI models. Firstly, their compositional structure may allow the computation of other quantities of interest, and may facilitate inference from the model to the modelled phenomenon by matching its structure. Secondly, they allow for diagrammatic explanations for their behaviour, based on influence constraints, diagram surgery and rewrite explanations. Finally, we discuss many future directions for the approach, raising the question of how to learn such meaningfully structured models in practice.

Paper Structure

This paper contains 27 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: String diagrams for (a) decision tree, (b) neural network with layers of size 3, 2, (c) simplified transformer, (d) causal model.
  • Figure 2: Diagrams for text representations in (a) recurrent neural network (b) DisCoCat model (c) DisCoCirc model.
  • Figure 3: Meaningful processes in various frameworks. (a) Applying inputs to a simple input-output model. (b) Conditional probability $P(Y | X, Z)$ in a statistical model. (c) Do-intervention on a causal model. (d) Counterfactual distribution for a functional causal model $\mathbb{M} = \mathbb{F} \circ \mathbb{L}$, where $\mathbb{F}$ and $\mathbb{L}$ denote the deterministic part for the endogenus variables and the product distribution over the exogenous variables, respectively.
  • Figure 4: (a) Example argument showing that $A$ cannot influence $D$ for a model of the left-hand form. (b) Illustration of diagram surgery in which we replace the component $f$ of a diagram with $f'$.
  • Figure 5: Toy examples of rewrite explanations, in which the equations used are implicit in the rewrite steps. (a) A DisCoCirc type model, where we explain why Alice is with Bob in the Garden, where is Alice? returns as its answer the location of the garden. The equations used in the rewriting express that if X is in/with Y then the answer to Where is X is simply Y. (b) A conceptual space type model, using information that yellow bananas are typically sweet to explain why they are output as tasty. The equation implicit in the first rewrite captures that a yellow banana is also sweet; in the second it states that sweetness on its own ensures tastiness.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 1