Table of Contents
Fetching ...

Concept Gradient: Concept-based Interpretation Without Linear Assumption

Andrew Bai, Chih-Kuan Yeh, Pradeep Ravikumar, Neil Y. C. Lin, Cho-Jui Hsieh

TL;DR

Concept Gradient (CG) extends concept-based interpretation beyond the linear assumption of Concept Activation Vector (CAV) by deriving a gradient-based sensitivity of the model output to concept functions $g: \mathbb{R}^d \to \mathbb{R}^m$ via $R_{\text{CG}}(x; f, g) = \nabla g(x)^{\dagger} \nabla f(x)$. CG unifies and generalizes CAV and GC by explicitly chaining gradients through the shared input space, recovering the derivative $h'(c)$ when a local inverse exists, and reducing to CAV in the linear case. Empirically, CG outperforms CAV on fine-grained image datasets (CUB, AwA2) in both local and global concept attribution and yields qualitatively coherent explanations across semantic levels, including a medical case study on mortality risk that aligns with literature. The work also provides practical guidance for implementing CG, including concept model training via finetuning, layer selection strategies, and normalization considerations, while acknowledging limitations related to differentiability and the need for representative concept data. Overall, CG offers a principled, non-linear, gradient-based framework for post-hoc concept explanations with demonstrated benefits for trust, debugging, and domain-specific decision support.

Abstract

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.

Concept Gradient: Concept-based Interpretation Without Linear Assumption

TL;DR

Concept Gradient (CG) extends concept-based interpretation beyond the linear assumption of Concept Activation Vector (CAV) by deriving a gradient-based sensitivity of the model output to concept functions via . CG unifies and generalizes CAV and GC by explicitly chaining gradients through the shared input space, recovering the derivative when a local inverse exists, and reducing to CAV in the linear case. Empirically, CG outperforms CAV on fine-grained image datasets (CUB, AwA2) in both local and global concept attribution and yields qualitatively coherent explanations across semantic levels, including a medical case study on mortality risk that aligns with literature. The work also provides practical guidance for implementing CG, including concept model training via finetuning, layer selection strategies, and normalization considerations, while acknowledging limitations related to differentiability and the need for representative concept data. Overall, CG offers a principled, non-linear, gradient-based framework for post-hoc concept explanations with demonstrated benefits for trust, debugging, and domain-specific decision support.

Abstract

Concept-based interpretations of black-box models are often more intuitive for humans to understand. The most widely adopted approach for concept-based interpretation is Concept Activation Vector (CAV). CAV relies on learning a linear relation between some latent representation of a given model and concepts. The linear separability is usually implicitly assumed but does not hold true in general. In this work, we started from the original intent of concept-based interpretation and proposed Concept Gradient (CG), extending concept-based interpretation beyond linear concept functions. We showed that for a general (potentially non-linear) concept, we can mathematically evaluate how a small change of concept affecting the model's prediction, which leads to an extension of gradient-based interpretation to the concept space. We demonstrated empirically that CG outperforms CAV in both toy examples and real world datasets.
Paper Structure (32 sections, 2 theorems, 38 equations, 7 figures, 10 tables)

This paper contains 32 sections, 2 theorems, 38 equations, 7 figures, 10 tables.

Key Result

Theorem 1

Consider a particular point $\hat{x}$ with $\hat{c}=g(\hat{x})$. Let $h: \mathbb{R}^m\rightarrow \mathbb{R}^d$ be a smooth and differentiable function mapping $c$ to $x$ and satisfy $g(h(c))=c$ locally within the $\epsilon$-ball around $\hat{c}$, then the gradient of $h$ will take the form of where any row vector of $G_\perp$ belongs to $\text{null}(\nabla g(x_0)^T)$ (null space of $\nabla g(x_0)

Figures (7)

  • Figure 1: Comparison of feature-based interpretation heatmap (left: Integrated Gradients) and concept-based importance score (right: Concept Gradients) for the model prediction of "Black footed Albatross". Attribution to high-level concepts is more informative to humans than raw pixels.
  • Figure 2: CUB concept recalls for different input representations in various layers and architectures (left to right, deep to shallow layers). CG consistently performs better than CAV locally and globally.
  • Figure 3: Visualization of instances with highest CG attributed importance (AwA2 validation set) for each concept (top 1 instance in the top 3 classes per concept). CG is capable of handling low level (colors), middle level (textures), and high level (body components) concepts simultaneously.
  • Figure 4: Concept prediction accuracy and concept attribution recall when finetuning starting from different layers of the model. Finetuning more layers leads to higher concept prediction accuracy.
  • Figure 5: Variance of gradients finetuning starting from different layers. The variance is higher when finetuning starting from earlier layers.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • proof