From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

Reduan Achtibat; Maximilian Dreyer; Ilona Eisenbraun; Sebastian Bosse; Thomas Wiegand; Wojciech Samek; Sebastian Lapuschkin

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

Reduan Achtibat, Maximilian Dreyer, Ilona Eisenbraun, Sebastian Bosse, Thomas Wiegand, Wojciech Samek, Sebastian Lapuschkin

TL;DR

The paper addresses the distinction between local attribution maps and global concept visualizations in XAI, proposing Concept Relevance Propagation (CRP) to fuse these perspectives and answer both where and what questions for individual predictions. CRP extends Layer-wise Relevance Propagation with conditional flows tied to learned concepts, and introduces Relevance Maximization (rmax) to select sample exemplars that reflect actual model use rather than mere activation strength. The authors demonstrate CRP's ability to produce human-understandable concept atlases and composition graphs, enabling detailed analyses of concept composition, impact, and subspaces, including a time-series and fairness-oriented investigations. A human study shows CRP-based explanations significantly improve primary task accuracy over standard attribution methods, supporting its practical value for debugging, safety-critical decision-making, and scientific discovery where interpretable model reasoning is essential. Overall, CRP provides a scalable, post-hoc, model-agnostic framework that enhances interpretability by making latent concepts tangible and searchable within the input domain.

Abstract

The field of eXplainable Artificial Intelligence (XAI) aims to bring transparency to today's powerful but opaque deep learning models. While local XAI methods explain individual predictions in form of attribution maps, thereby identifying where important features occur (but not providing information about what they represent), global explanation techniques visualize what concepts a model has generally learned to encode. Both types of methods thus only provide partial insights and leave the burden of interpreting the model's reasoning to the user. In this work we introduce the Concept Relevance Propagation (CRP) approach, which combines the local and global perspectives and thus allows answering both the "where" and "what" questions for individual predictions. We demonstrate the capability of our method in various settings, showcasing that CRP leads to more human interpretable explanations and provides deep insights into the model's representation and reasoning through concept atlases, concept composition analyses, and quantitative investigations of concept subspaces and their role in fine-grained decision making.

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

TL;DR

Abstract

Paper Structure (97 sections, 48 equations, 63 figures, 3 tables)

This paper contains 97 sections, 48 equations, 63 figures, 3 tables.

Introduction
Methods in Brief
Results
Understanding Concept Composition Leading to Prediction
Understanding Concept Impact and Reach
Understanding Concept Subspaces, (Dis)Similarities and Roles
Human Evaluation Study
Discussion
Methods
Concept Relevance Propagation
Selecting Reference Examples
Activation Maximization
Relevance Maximization
Comparing Feature Channels with Averaged Cosine Similarity on Reference Samples
Human Evaluation Study Details
...and 82 more sections

Figures (63)

Figure 1: Glocal can tell which features exist and how they are used for predictions by unifying local and global . (Left): Local explanations visualize which input pixels are relevant for the prediction. Here, the model focuses on the eye region for all three predictions. However, what features in particular the model has recognized in those regions remains open for interpretation by the user. (Right): By finding reference images that maximally represent particular (groups of) neurons, global methods give insight into the concepts generally encoded by the model. However, global methods alone do not inform which concepts are recognized, used and combined by the model in per-sample inference. (Center): Glocal can identify the relevant neurons for a particular prediction (property of local XAI) and then visualize the concepts these neurons encode (property of global XAI). Further, by using concept-conditional explanations as a filter mask, the concepts' defining parts can be highlighted in the reference images, which largely increases interpretability and clarity. Here, the topmost sample has been predicted into age group (3-7) due to the sample's large irides and round eyes, while the middle sample is predicted as (25-32), as more of the sclera is visible and eyebrows are more apparent. For the bottom sample, the model has predicted class (60+) based on its recognition of heavy wrinkles around the eyes and on the eyelids, and pronounced tear sacs next to a large knobby nose.
Figure 1: a) Class-conditional heatmap example for MNIST using : Selecting one specific output class (see numbers above) for the heatmap calculation leads to explanations conveying precise meaning, namely "which features in their given state support (red) or contradict (blue) the class output", compared to choosing all output classes at once, where the meaning of the class-specific sign of the attribution is lost. For , the heatmap computed for the whole output at once is a superposition of all class-specific heatmaps. b) In some cases, traditional heatmaps can still be rather uninformative despite being class-specific, as is shown for bird species classification examples. Here, heatmaps only hint at the location of relevant body parts, without specifying the (different) concepts, such as, e.g., species-specific beak shapes, being recognized and considered by the model. A reference colormap for heatmaps shown in this paper is given in Supplementary Figure \ref{['fig:appendix:color_map']}.
Figure 2: Brief overview over the methodological contributions of this work. a) Traditional backpropagation-based methods such as propagate relevance scores backwards through the network culminating into single attribution map. b) By conditioning on a concept encoded by a hidden layer channel of the network, allows to compute concept-conditional explanations. c) To provide a semantic meaning for latent model structures, we propose with to visualize input samples where the latent structure was strongly relevant for a prediction. We can further highlight the semantics by only displaying the relevant input parts according to concept-specific explanations as introduced in b.
Figure 2: Explanation disentanglement via crc. Target concept "dog" as expressed by an output neuron in layer $L$ is described by a combination of lower-level concepts such as "snout", "eye" and "fur" encoded in a lower layer $l$. heatmaps regarding individual concepts, and their contribution to the prediction of "dog", can be generated by applying masks to filter-channels in the backward pass. Global (in the context of an input sample) relevance of a concept wrt. to the explained prediction can thus not only be measured in latent space, but also precisely visualized, localized and measured in input space. The concept-conditional computation of $R(\mathbf{x}|\theta_{df})$ reveals the relatively high importance of the spatially distributed "fur" feature for the prediction of "dog", compared to the feature "eye". The attribution of $R(\mathbf{x}|\theta_{df})$ in the input space visualization of $R(\mathbf{x}|\theta_{d})$ (which was computed jointly over all concepts), however, is dominated by $R(\mathbf{x}|\theta_{de})$ and $R(\mathbf{x}|\theta_{ds})$ which both concentrate more strongly on smaller image regions and attribute both to the dog's eye. Here, the visualization of $R(\mathbf{x}|\theta_{d})$ alone does not represent the relative importance of the concepts' contributions to the "dog" outcome.
Figure 3: Understanding concepts and concept composition with CRP. a) An attribution map indicates that various body parts of the bird are relevant for the prediction. b) Channel-conditional explanations computed with help to localize and understand channel concepts by providing masked reference samples (explaining by example). c) relevances can further be used to construct a concept atlas, visualizing which concepts dominate in specific regions in the input image defined by super-pixels. Here, the most relevant channels in layer layer3.0.conv2 can be identified with concepts "dots" (channel 210 and 130), "red spot" (10), "black eyes" (187) and "stripes-like" (19). d) Concept Composition Graphs decompose a concept of interest given a particular prediction into lower-layer concepts, thus improving concept understanding. Shown are relevant (sub)-concepts in features.24 and features.26 for concept "animal on branch" in features.28 for the prediction of class "Bee Eater". The relevance flow is highlighted in red, with the relative percentage of relevance flow to the lower-level concepts. For each concept, the channel is given with the relative global relevance score (wrt. channel 102 in features.28) in parentheses. Following the relevance flow, concept "animal on branch" is dependent on concepts describing the branch (e.g., "wood (horizontal)" and "brown, knobby") and colorful plumage (e.g., "colorful feathers" and "colorful threads"). Additional examples can be found in Supplementary Note \ref{['sec:appendix:conceptlocalization']}.
...and 58 more figures

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

TL;DR

Abstract

From Attribution Maps to Human-Understandable Explanations through Concept Relevance Propagation

Authors

TL;DR

Abstract

Table of Contents

Figures (63)