Table of Contents
Fetching ...

Measuring the (Un)Faithfulness of Concept-Based Explanations

Shubham Kumar, Narendra Ahuja

TL;DR

This work critiques how faithfulness of unsupervised concept-based explanations (U-CBEMs) has been measured, arguing that prior surrogates and deletion proxies can overstate faithfulness while remaining interpretable. It introduces SURF, a simple, linear surrogate that uses concept activations and importances to predict model outputs, paired with two cross-output metrics (SURF_MAE and SURF_EMD) to gauge faithfulness across all classes. A measure-over-measure sanity check shows SURF uniquely preserves faithfulness under increasing randomization, enabling a fair benchmark across six U-CBEMs on multiple tasks; results reveal that many state-of-the-art U-CBEMs are not faithful. SURF also provides a principled way to choose the number of concepts, demonstrating different trade-offs between interpretability and fidelity. The authors suggest adopting SURF as a standard faithfulness benchmark for future work and release the code publicly.

Abstract

Deep vision models perform input-output computations that are hard to interpret. Concept-based explanation methods (CBEMs) increase interpretability by re-expressing parts of the model with human-understandable semantic units, or concepts. Checking if the derived explanations are faithful -- that is, they represent the model's internal computation -- requires a surrogate that combines concepts to compute the output. Simplifications made for interpretability inevitably reduce faithfulness, resulting in a tradeoff between the two. State-of-the-art unsupervised CBEMs (U-CBEMs) have reported increasingly interpretable concepts, while also being more faithful to the model. However, we observe that the reported improvement in faithfulness artificially results from either (1) using overly complex surrogates, which introduces an unmeasured cost to the explanation's interpretability, or (2) relying on deletion-based approaches that, as we demonstrate, do not properly measure faithfulness. We propose Surrogate Faithfulness (SURF), which (1) replaces prior complex surrogates with a simple, linear surrogate that measures faithfulness without changing the explanation's interpretability and (2) introduces well-motivated metrics that assess loss across all output classes, not just the predicted class. We validate SURF with a measure-over-measure study by proposing a simple sanity check -- explanations with random concepts should be less faithful -- which prior surrogates fail. SURF enables the first reliable faithfulness benchmark of U-CBEMs, revealing that many visually compelling U-CBEMs are not faithful. Code to be released.

Measuring the (Un)Faithfulness of Concept-Based Explanations

TL;DR

This work critiques how faithfulness of unsupervised concept-based explanations (U-CBEMs) has been measured, arguing that prior surrogates and deletion proxies can overstate faithfulness while remaining interpretable. It introduces SURF, a simple, linear surrogate that uses concept activations and importances to predict model outputs, paired with two cross-output metrics (SURF_MAE and SURF_EMD) to gauge faithfulness across all classes. A measure-over-measure sanity check shows SURF uniquely preserves faithfulness under increasing randomization, enabling a fair benchmark across six U-CBEMs on multiple tasks; results reveal that many state-of-the-art U-CBEMs are not faithful. SURF also provides a principled way to choose the number of concepts, demonstrating different trade-offs between interpretability and fidelity. The authors suggest adopting SURF as a standard faithfulness benchmark for future work and release the code publicly.

Abstract

Deep vision models perform input-output computations that are hard to interpret. Concept-based explanation methods (CBEMs) increase interpretability by re-expressing parts of the model with human-understandable semantic units, or concepts. Checking if the derived explanations are faithful -- that is, they represent the model's internal computation -- requires a surrogate that combines concepts to compute the output. Simplifications made for interpretability inevitably reduce faithfulness, resulting in a tradeoff between the two. State-of-the-art unsupervised CBEMs (U-CBEMs) have reported increasingly interpretable concepts, while also being more faithful to the model. However, we observe that the reported improvement in faithfulness artificially results from either (1) using overly complex surrogates, which introduces an unmeasured cost to the explanation's interpretability, or (2) relying on deletion-based approaches that, as we demonstrate, do not properly measure faithfulness. We propose Surrogate Faithfulness (SURF), which (1) replaces prior complex surrogates with a simple, linear surrogate that measures faithfulness without changing the explanation's interpretability and (2) introduces well-motivated metrics that assess loss across all output classes, not just the predicted class. We validate SURF with a measure-over-measure study by proposing a simple sanity check -- explanations with random concepts should be less faithful -- which prior surrogates fail. SURF enables the first reliable faithfulness benchmark of U-CBEMs, revealing that many visually compelling U-CBEMs are not faithful. Code to be released.

Paper Structure

This paper contains 32 sections, 15 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Framework. U-CBEM faithfulness measures compare (using metric $d$) the output $\mathbf{y}$ of the original model $\phi(.) = f(g(.))$ with the output $\mathbf{\hat{y}}$ from the explanation. To obtain the explanation, U-CBEMs transform intermediate representation $\mathbf{h} \in \mathcal{R}^D$ to the concept representation through concept projection $\mathcal{P}(\mathbf{h})$ and provide an accompanying concept importance $A$. The explanation is then passed through surrogate $s$ to obtain $\mathbf{\hat{y}}$. Deletion-based proxies (right) observe model degradation after performing deletion in a deletion space. Surrogate-based measures (left) do not manipulate the computation; instead, they introduce a surrogate to directly approximate \ref{['eq:faithfulness-ucbem-def']}.
  • Figure 2: We fit U-CBEMs on \ref{['sec:results-benchmark']}'s Object Classification task with an increasing number of concepts and evaluate their faithfulness with SURF. Some U-CBEMs do not improve as the number of concepts increase. U-CBEMs that improve quickly plateau.