Table of Contents
Fetching ...

Estimation of Concept Explanations Should be Uncertainty Aware

Vihari Piratla, Juyeon Heo, Katherine M. Collins, Sukriti Singh, Adrian Weller

TL;DR

This work tackles the noise and data inefficiency of concept-based explanations by introducing U-ACE, a Bayesian estimator that explicitly models uncertainty in concept activations and learns a noise-robust mapping to model logits. Using a multimodal CLIP framework to define concepts from text descriptions, U-ACE derives per-example activation intervals and a posterior over explanation weights, yielding more stable, label-efficient explanations even under misspecified concept sets. Theoretical results show robustness advantages over standard estimators in over- and under-complete concept sets, while empirical evaluations across controlled and real-world datasets demonstrate improved alignment with ground-truth explanations and reduced sensitivity to probe datasets and dataset shifts. The approach demonstrates strong potential for reliable model interpretation in complex, open-world settings and is released with code for broader adoption and further validation.

Abstract

Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline that lead to such noise. We then propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations. We demonstrate with theoretical analysis and empirical evaluation that explanations computed by our method are robust to train-time choices while also being label-efficient. Further, our method proved capable of recovering relevant concepts amongst a bank of thousands, in an evaluation with real-datasets and off-the-shelf models, demonstrating its scalability. We believe the improved quality of uncertainty-aware concept explanations make them a strong candidate for more reliable model interpretation. We release our code at https://github.com/vps-anonconfs/uace.

Estimation of Concept Explanations Should be Uncertainty Aware

TL;DR

This work tackles the noise and data inefficiency of concept-based explanations by introducing U-ACE, a Bayesian estimator that explicitly models uncertainty in concept activations and learns a noise-robust mapping to model logits. Using a multimodal CLIP framework to define concepts from text descriptions, U-ACE derives per-example activation intervals and a posterior over explanation weights, yielding more stable, label-efficient explanations even under misspecified concept sets. Theoretical results show robustness advantages over standard estimators in over- and under-complete concept sets, while empirical evaluations across controlled and real-world datasets demonstrate improved alignment with ground-truth explanations and reduced sensitivity to probe datasets and dataset shifts. The approach demonstrates strong potential for reliable model interpretation in complex, open-world settings and is released with code for broader adoption and further validation.

Abstract

Model explanations can be valuable for interpreting and debugging predictive models. We study a specific kind called Concept Explanations, where the goal is to interpret a model using human-understandable concepts. Although popular for their easy interpretation, concept explanations are known to be noisy. We begin our work by identifying various sources of uncertainty in the estimation pipeline that lead to such noise. We then propose an uncertainty-aware Bayesian estimation method to address these issues, which readily improved the quality of explanations. We demonstrate with theoretical analysis and empirical evaluation that explanations computed by our method are robust to train-time choices while also being label-efficient. Further, our method proved capable of recovering relevant concepts amongst a bank of thousands, in an evaluation with real-datasets and off-the-shelf models, demonstrating its scalability. We believe the improved quality of uncertainty-aware concept explanations make them a strong candidate for more reliable model interpretation. We release our code at https://github.com/vps-anonconfs/uace.
Paper Structure (41 sections, 5 theorems, 23 equations, 14 figures, 13 tables, 1 algorithm)

This paper contains 41 sections, 5 theorems, 23 equations, 14 figures, 13 tables, 1 algorithm.

Key Result

Proposition 1

For a concept k and $\alpha_k$ defined as above, we have where cos($\theta_k$)=$\text{cos-sim}(g_{text}(T_k), g(\mathbf{x}))$ and $\vec{m}(\mathbf{x})_k, \vec{s}(\mathbf{x})_k$ denote the $k^{th}$ element of the vector.

Figures (14)

  • Figure 1: Our proposed estimator: Uncertainty-Aware Concept Explanations (U-ACE). We track uncertainty in concept activation scores from Step 1, and model them in Step 2.
  • Figure 2: (Left) STL dataset with a spurious tag. (Middle) Importance of a tag concept for three different model-to-be-explained. X-axis shows the probability of tag in the training dataset of model-to-be-explained. (Right) Average rank of true concepts with irrelevant concepts; lower is better.
  • Figure 3: Two most relevant concepts plus any mistake (marked in red) from top-10 concepts for a scene-classification model estimated with various algorithms using PASCAL (left) or ADE20K (right) probe-dataset.
  • Figure 4: Toy
  • Figure 5: Left, middle plots show the importance of red and green concepts while the rightmost plot shows their importance score difference. U-ACE estimated large uncertainty in importance score when red or green concept is missing from the dataset as seen in the left of the left and middle plots. Also the difference in importance at either extreme in the right plot is not statistically significant.
  • ...and 9 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Proposition 2
  • Corollary 1
  • Proposition 3
  • proof
  • proof
  • Corollary 2
  • proof
  • proof