Table of Contents
Fetching ...

Concept-Based Explainable Artificial Intelligence: Metrics and Benchmarks

Halil Ibrahim Aysel, Xiaohao Cai, Adam Prugel-Bennett

TL;DR

This work critiques concept-based XAI approaches by demonstrating that important concepts identified by post-hoc CBMs often do not exist in images or are poorly localized. It introduces three metrics—concept global importance metric (CGIM), concept existence metric (CEM), and concept location metric (CLM)—and a visualization tool, concept activation mapping (CoAM), to quantify and visualize alignment between human concepts and neural representations. Using the Caltech-UCSB Birds (CUB) dataset and reproduced post-hoc CBMs, the authors show significant gaps in both global and local explainability and discuss root causes such as concept correlations. The findings argue for more rigorous evaluation criteria and benchmarks to drive the development of concept-based explanations that are spatially grounded and reliable in real-world settings.

Abstract

Concept-based explanation methods, such as concept bottleneck models (CBMs), aim to improve the interpretability of machine learning models by linking their decisions to human-understandable concepts, under the critical assumption that such concepts can be accurately attributed to the network's feature space. However, this foundational assumption has not been rigorously validated, mainly because the field lacks standardised metrics and benchmarks to assess the existence and spatial alignment of such concepts. To address this, we propose three metrics: the concept global importance metric, the concept existence metric, and the concept location metric, including a technique for visualising concept activations, i.e., concept activation mapping. We benchmark post-hoc CBMs to illustrate their capabilities and challenges. Through qualitative and quantitative experiments, we demonstrate that, in many cases, even the most important concepts determined by post-hoc CBMs are not present in input images; moreover, when they are present, their saliency maps fail to align with the expected regions by either activating across an entire object or misidentifying relevant concept-specific regions. We analyse the root causes of these limitations, such as the natural correlation of concepts. Our findings underscore the need for more careful application of concept-based explanation techniques especially in settings where spatial interpretability is critical.

Concept-Based Explainable Artificial Intelligence: Metrics and Benchmarks

TL;DR

This work critiques concept-based XAI approaches by demonstrating that important concepts identified by post-hoc CBMs often do not exist in images or are poorly localized. It introduces three metrics—concept global importance metric (CGIM), concept existence metric (CEM), and concept location metric (CLM)—and a visualization tool, concept activation mapping (CoAM), to quantify and visualize alignment between human concepts and neural representations. Using the Caltech-UCSB Birds (CUB) dataset and reproduced post-hoc CBMs, the authors show significant gaps in both global and local explainability and discuss root causes such as concept correlations. The findings argue for more rigorous evaluation criteria and benchmarks to drive the development of concept-based explanations that are spatially grounded and reliable in real-world settings.

Abstract

Concept-based explanation methods, such as concept bottleneck models (CBMs), aim to improve the interpretability of machine learning models by linking their decisions to human-understandable concepts, under the critical assumption that such concepts can be accurately attributed to the network's feature space. However, this foundational assumption has not been rigorously validated, mainly because the field lacks standardised metrics and benchmarks to assess the existence and spatial alignment of such concepts. To address this, we propose three metrics: the concept global importance metric, the concept existence metric, and the concept location metric, including a technique for visualising concept activations, i.e., concept activation mapping. We benchmark post-hoc CBMs to illustrate their capabilities and challenges. Through qualitative and quantitative experiments, we demonstrate that, in many cases, even the most important concepts determined by post-hoc CBMs are not present in input images; moreover, when they are present, their saliency maps fail to align with the expected regions by either activating across an entire object or misidentifying relevant concept-specific regions. We analyse the root causes of these limitations, such as the natural correlation of concepts. Our findings underscore the need for more careful application of concept-based explanation techniques especially in settings where spatial interpretability is critical.

Paper Structure

This paper contains 25 sections, 9 equations, 4 figures, 8 tables, 2 algorithms.

Figures (4)

  • Figure 1: Overview of CAVs, CBMs, post-hoc CBMs and the proposed techniques. Feature extractor ⓐ, concept prediction block ⓑ, CAVs ⓒ, concept bottleneck ⓓ, classifier ⓔ, and our proposed CoAM framework ⓕ. A traditional (without concept bottleneck) classification model consists of ⓐ + ⓔ, and ⓒ is the introduced post-hoc to explain its predictions via CAVs. ⓐ + ⓑ + ⓓ + ⓔ forms the steps for traditional CBMs training, whereas ⓐ + ⓒ + ⓓ + ⓔ forms the post-hoc CBMs. Our proposed CoAM framework is ⓕ, weighing pre-GAP feature maps with CAVs for concept visualisation. ⓖ presents the example steps of our proposed metrics.
  • Figure 2: Histograms of the CGIM scores of the post-hoc CBMs. Plots on the left and right columns show the results for classes and concepts, respectively. A full list of the CGIM scores can be found in Table \ref{['table:cos_sim_full']} for the concepts and in Tables \ref{['table:cos_sim_class']} and \ref{['table:cos_sim_class_continued']} for the classes in the Appendix.
  • Figure 3: Randomly selected test images from different classes and the top 5 most important concepts for their classification by the post-hoc CBMs. In particular, symbols ✓ and ✗ are for concept existence and absence in the ground-truth label, respectively.
  • Figure 4: Class and concept visualisation with our CoAM. All images (on the left) are correctly classified and their class-wise saliency maps are given on the right. The four most important concepts under CEM for the given classifications and their individual saliency maps are given in the middle. In particular, symbols ✓ and ✗ are for concept existence and absence in the ground-truth label, respectively.