Table of Contents
Fetching ...

A Framework for Causal Concept-based Model Explanations

Anna Rodum Bjøru, Jacob Lysnæs-Larsen, Oskar Jørgensen, Inga Strümke, Helge Langseth

TL;DR

The paper tackles the opacity of high-stakes models by proposing a causal concept-based XAI framework that combines a human-friendly concept vocabulary with a causal model to produce faithful explanations. Explanations are generated locally and globally via the probability of sufficiency, enabling counterfactual and interventional analysis of concept interventions. The method is demonstrated through a proof-of-concept using CelebA with StarGAN-based concept-to-data mapping, highlighting the ability to produce counterfactual explanations and contrastive reasoning while addressing fidelity and interpretability. The work lays a foundation for integrating expert knowledge into causal models to detect spurious correlations and guide more trustworthy model explanations, with clear avenues for future improvements in concept detection, abstraction levels, and validation across domains.

Abstract

This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.

A Framework for Causal Concept-based Model Explanations

TL;DR

The paper tackles the opacity of high-stakes models by proposing a causal concept-based XAI framework that combines a human-friendly concept vocabulary with a causal model to produce faithful explanations. Explanations are generated locally and globally via the probability of sufficiency, enabling counterfactual and interventional analysis of concept interventions. The method is demonstrated through a proof-of-concept using CelebA with StarGAN-based concept-to-data mapping, highlighting the ability to produce counterfactual explanations and contrastive reasoning while addressing fidelity and interpretability. The work lays a foundation for integrating expert knowledge into causal models to detect spurious correlations and guide more trustworthy model explanations, with clear avenues for future improvements in concept detection, abstraction levels, and validation across domains.

Abstract

This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.

Paper Structure

This paper contains 42 sections, 6 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: The graph details the complete causal framework over variables $\mathbf{u}\cup\mathbf{z}\cup\mathbf{w}\cup\mathbf{x}\cup\{{\hat{\mathrm{y}}}\}$ that is presented in this section. Variables $\mathbf{x}$ and ${\hat{\mathrm{y}}}$ are included with gray background, as these are observed variables from the point of view of $h$, the model to be explained. Variables $\mathbf{z}, \mathbf{w}, \mathbf{u}$ are generally unobserved.
  • Figure 2: The figure details an example causal graph over concept variables $\mathrm{z}_1, \mathrm{z}_2, \mathrm{z}_3, \mathrm{z}_4, \mathrm{z}_5$, each with an exogenous parent $\mathrm{u}_i$.
  • Figure 3: The causal model over concepts relevant for the age-classifier
  • Figure 4: The causal model over concepts relevant for the attractiveness-classifier
  • Figure 5: Example-images from the dataset together with several counterfactual variations. From left to right, the columns contain the original, followed by counterfactuals with $\textsf{Glasses}\xspace=1$, $\textsf{Gray Hair}\xspace=1$, $\textsf{Young}\xspace=0$, and $\textsf{Gray Hair}\xspace=1$ & $\textsf{Young}\xspace=0$.
  • ...and 3 more figures