A Framework for Causal Concept-based Model Explanations
Anna Rodum Bjøru, Jacob Lysnæs-Larsen, Oskar Jørgensen, Inga Strümke, Helge Langseth
TL;DR
The paper tackles the opacity of high-stakes models by proposing a causal concept-based XAI framework that combines a human-friendly concept vocabulary with a causal model to produce faithful explanations. Explanations are generated locally and globally via the probability of sufficiency, enabling counterfactual and interventional analysis of concept interventions. The method is demonstrated through a proof-of-concept using CelebA with StarGAN-based concept-to-data mapping, highlighting the ability to produce counterfactual explanations and contrastive reasoning while addressing fidelity and interpretability. The work lays a foundation for integrating expert knowledge into causal models to detect spurious correlations and guide more trustworthy model explanations, with clear avenues for future improvements in concept detection, abstraction levels, and validation across domains.
Abstract
This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.
