An Axiomatic Approach to Model-Agnostic Concept Explanations

Zhili Feng; Michal Moshkovitz; Dotan Di Castro; J. Zico Kolter

An Axiomatic Approach to Model-Agnostic Concept Explanations

Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter

TL;DR

This work tackles the lack of model-agnostic concept explanations by introducing an axiomatic framework with three core principles: linearity with respect to examples, recursivity, and similarity. From these axioms, the authors derive a family of measures for concept influence, including symmetric, class-conditioned (necessity), and concept-conditioned (sufficiency) forms, and provide an efficient estimation algorithm. They connect their framework to prior methods like TCAV and completeness-aware explanations, showing how TCAV corresponds to necessity and completeness to sufficiency, while enabling faster, model-agnostic computation. Through experiments on tasks like model and optimizer selection, as well as prompt editing for CLIP-based vision-language models, the approach demonstrates practical utility and interpretability, including automatic concept labeling. Overall, the method offers a principled, scalable path to understanding and improving black-box models via interpretable concepts without requiring access to internal model details.

Abstract

Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings. Experimentally, we demonstrate the utility of the new method by applying it in different scenarios: for model selection, optimizer selection, and model improvement using a kind of prompt editing for zero-shot vision language models.

An Axiomatic Approach to Model-Agnostic Concept Explanations

TL;DR

Abstract

Paper Structure (32 sections, 4 theorems, 32 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 32 equations, 8 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Post hoc explanations.
Concept-based explanations.
Axiomatic approach to feature attribution explanations.
Understanding the explanations.
The Axiomatic Approach
Notation.
Problem formulation.
Axiom 1: Linearity with respect to examples
Axiom 2: Recursivity
Axiom 3: Similarity
The new measures and estimation algorithm
TCAV and Completeness-aware Explanations within the Axiomatic Framework
Setting.
...and 17 more sections

Key Result

Theorem 4.0

For concept $c:\mathcal{X}\rightarrow\{-1,+1\}$ and predictor $h:\mathcal{X}\rightarrow\{-1,+1\}$, the following holds

Figures (8)

Figure 1: Necessary and sufficient concepts under our proposed measures. Left: $\mathbb{E}[c(x)|h(x)=1]$, where $h(x)$ predicts the fine-grained Felidae classes (e.g. Persian cat or cheetah), $c(x)$ is the "Felidae" or "wolf" concept. Right: $\mathbb{E}[h(x)|c(x)=1]$ where $h(x)=1$ means the image is a wolf or Felidae, $c(x)$ is the fine-grained Felidae concepts.
Figure 2: Logistic regression versus random forest. In both figures, the $x$-axis represents the concepts with positive measures, and the $y$-axis corresponds to a specific measure at interest. Left: $\mathbb{E}[c(x)|h(x)=1]$, where $h(x)=1$ predicts class "cat". Right: $\mathbb{E}[h(x)|c(x)=1]$, where $h(x)=1$ predicts class "chair".
Figure 3: SGD versus AdamW with $\mathbb{E}[c(x)|h(x)=1]$, where $h(x)$ predicts "motorbike" class.
Figure 4: Prompt editing with $\mathbb{E}[c(x)|h(x)=1]$ where $h(x)=1$ predicts class "bottle".
Figure 5: $\mathbb{E}[c(x)|h(x)=1]$ with original CLIP prompts, edited prompts, and ground truth.
...and 3 more figures

Theorems & Definitions (12)

Claim 3.0
Definition 3.1
Theorem 4.0
Theorem 4.0
Claim A.0
proof
Claim A.1
proof
Theorem B.0
proof
...and 2 more

An Axiomatic Approach to Model-Agnostic Concept Explanations

TL;DR

Abstract

An Axiomatic Approach to Model-Agnostic Concept Explanations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)