Table of Contents
Fetching ...

Beyond Attribution: Unified Concept-Level Explanations

Junhao Liu, Haonan Yu, Xin Zhang

TL;DR

This workInstantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models, demonstrating that UnCLE provides explanations more faithful than state-of-the-art concept-based explanation methods, and provides richer explanation forms that satisfy various user needs.

Abstract

There is an increasing need to integrate model-agnostic explanation techniques with concept-based approaches, as the former can explain models across different architectures while the latter makes explanations more faithful and understandable to end-users. However, existing concept-based model-agnostic explanation methods are limited in scope, mainly focusing on attribution-based explanations while neglecting diverse forms like sufficient conditions and counterfactuals, thus narrowing their utility. To bridge this gap, we propose a general framework UnCLE to elevate existing local model-agnostic techniques to provide concept-based explanations. Our key insight is that we can uniformly extend existing local model-agnostic methods to provide unified concept-based explanations with large pre-trained model perturbation. We have instantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models. Our evaluation results demonstrate that UnCLE provides explanations more faithful than state-of-the-art concept-based explanation methods, and provides richer explanation forms that satisfy various user needs.

Beyond Attribution: Unified Concept-Level Explanations

TL;DR

This workInstantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models, demonstrating that UnCLE provides explanations more faithful than state-of-the-art concept-based explanation methods, and provides richer explanation forms that satisfy various user needs.

Abstract

There is an increasing need to integrate model-agnostic explanation techniques with concept-based approaches, as the former can explain models across different architectures while the latter makes explanations more faithful and understandable to end-users. However, existing concept-based model-agnostic explanation methods are limited in scope, mainly focusing on attribution-based explanations while neglecting diverse forms like sufficient conditions and counterfactuals, thus narrowing their utility. To bridge this gap, we propose a general framework UnCLE to elevate existing local model-agnostic techniques to provide concept-based explanations. Our key insight is that we can uniformly extend existing local model-agnostic methods to provide unified concept-based explanations with large pre-trained model perturbation. We have instantiated UnCLE to provide concept-based explanations in three forms: attributions, sufficient conditions, and counterfactuals, and applied it to popular text, image, and multimodal models. Our evaluation results demonstrate that UnCLE provides explanations more faithful than state-of-the-art concept-based explanation methods, and provides richer explanation forms that satisfy various user needs.

Paper Structure

This paper contains 37 sections, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Examples of using LIME and UnCLE-augmented LIME to explain (a) an image classification model (YOLOv8) and (b) a text classification model (BERT). The UnCLE-augmented versions provide concept-based explanations, utilizing detected objects or topics rather than fragmented superpixels or words.
  • Figure 2: Example explanations from Anchors, LORE, and their UnCLE-augmented versions. The Anchors explanation states that the presence of specific image regions guarantees that the model classifies the image as a pickup. The LORE explanation shows that masking these regions would lead the model to predict a different class. The UnCLE-augmented versions provide concept-based explanations, using detected objects rather than fragmented superpixels.
  • Figure 3: The workflow of UnCLE-augmented local model-agnostic explanation techniques.
  • Figure 4: (a) Average running time of explanation methods we used on image data. (b) Matched-budget comparisons between LIME, EAC, ConceptLIME and UnCLE local unified explanations.
  • Figure 5: Effect of concept absence on explanation fidelity.
  • ...and 6 more figures