Table of Contents
Fetching ...

FACE: Faithful Automatic Concept Extraction

Dipkamal Bhusal, Michael Clifford, Sara Rampazzi, Nidhi Rastogi

TL;DR

FACE introduces a KL divergence-regularized NMF to learn concept representations that faithfully reflect a model’s downstream predictions. By supervising the factorization with the classifier’s output, FACE provides concept-based explanations that remain consistent with the original decision process, and it offers theoretical guarantees that bound predictive deviation. Empirically, FACE outperforms prior methods on faithfulness and sparsity across ImageNet, COCO, and CelebA, while maintaining competitive reconstruction. The approach advances practical interpretability by delivering semantically coherent yet behaviorally faithful concepts suitable for debugging and trust-building in vision models.

Abstract

Interpreting deep neural networks through concept-based explanations offers a bridge between low-level features and high-level human-understandable semantics. However, existing automatic concept discovery methods often fail to align these extracted concepts with the model's true decision-making process, thereby compromising explanation faithfulness. In this work, we propose FACE (Faithful Automatic Concept Extraction), a novel framework that augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term to ensure alignment between the model's original and concept-based predictions. Unlike prior methods that operate solely on encoder activations, FACE incorporates classifier supervision during concept learning, enforcing predictive consistency and enabling faithful explanations. We provide theoretical guarantees showing that minimizing the KL divergence bounds the deviation in predictive distributions, thereby promoting faithful local linearity in the learned concept space. Systematic evaluations on ImageNet, COCO, and CelebA datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics.

FACE: Faithful Automatic Concept Extraction

TL;DR

FACE introduces a KL divergence-regularized NMF to learn concept representations that faithfully reflect a model’s downstream predictions. By supervising the factorization with the classifier’s output, FACE provides concept-based explanations that remain consistent with the original decision process, and it offers theoretical guarantees that bound predictive deviation. Empirically, FACE outperforms prior methods on faithfulness and sparsity across ImageNet, COCO, and CelebA, while maintaining competitive reconstruction. The approach advances practical interpretability by delivering semantically coherent yet behaviorally faithful concepts suitable for debugging and trust-building in vision models.

Abstract

Interpreting deep neural networks through concept-based explanations offers a bridge between low-level features and high-level human-understandable semantics. However, existing automatic concept discovery methods often fail to align these extracted concepts with the model's true decision-making process, thereby compromising explanation faithfulness. In this work, we propose FACE (Faithful Automatic Concept Extraction), a novel framework that augments Non-negative Matrix Factorization (NMF) with a Kullback-Leibler (KL) divergence regularization term to ensure alignment between the model's original and concept-based predictions. Unlike prior methods that operate solely on encoder activations, FACE incorporates classifier supervision during concept learning, enforcing predictive consistency and enabling faithful explanations. We provide theoretical guarantees showing that minimizing the KL divergence bounds the deviation in predictive distributions, thereby promoting faithful local linearity in the learned concept space. Systematic evaluations on ImageNet, COCO, and CelebA datasets demonstrate that FACE outperforms existing methods across faithfulness and sparsity metrics.

Paper Structure

This paper contains 54 sections, 19 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Comparing concepts extracted by CRAFT fel2023craft, ICE zhang2021invertible, and FACE from rabbit images classified by ResNet-34 he2016deep. C1 and C2 correspond to the top two Sobol-importance concepts for each method. FACE achieves higher faithfulness compared to CRAFT and ICE (discussed in Section \ref{['section:faithfulnessComplexity']}), demonstrating that FACE’s extracted concepts better align with the model’s true reasoning.
  • Figure 2: Comparison of predictive alignment across concept extraction methods using ResNet-34 he2016deep. FACE achieves both accurate and faithful reconstructions, while CRAFT and ICE may preserve top-1 predictions yet diverge in KL-divergence.
  • Figure 3: Qualitative comparison of top-concept extraction by CRAFT fel2023craft, ICE zhang2021invertible, and our method, FACE for four classes (Golf, Church, Gray hair, and Tench) using ResNet-34 he2016deep.
  • Figure 4: Effect of KL regularization strength $\lambda$ on faithfulness (Concept Insertion (C-Ins) & Concept Deletion (C-Del)) and classifier accuracy on reconstructed activations across datasets. A small KL penalty improves faithfulness across all datasets. However, large $\lambda$ values degrade performance on high-class datasets (ImageNet, COCO), while CelebA benefits from stronger regularization due to lower class complexity.
  • Figure 5: Effect of decomposition rank ($r$) on faithfulness (Concept Insertion(C-Ins), Concept Deletion(C-Del) and sparsity (C-Gini) across datasets. Increasing rank improves both faithfulness and sparsity, with diminishing gains beyond $r=25$.
  • ...and 13 more figures