Table of Contents
Fetching ...

Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs

Kaiwen Xu, Kazuto Fukuchi, Youhei Akimoto, Jun Sakuma

TL;DR

Concept-based explanations can mislead when false positives appear among interpretable factors. The authors propose a statistically grounded framework using Model-X Knockoff to select significant concepts while enforcing the false discovery rate at level $q$, applicable to both unsupervised (CSR-VAE) and supervised (CSR-CBM) concept learning. They prove an FDR guarantee and validate the approach on synthetic data and real datasets (Colored MNIST, CelebA), demonstrating improved interpretability without sacrificing predictive performance. By coupling sparse concept learning with Knockoff-based selection, the method provides reliable, human-understandable explanations that enhance trust in deep image classifiers.

Abstract

A concept-based classifier can explain the decision process of a deep learning model by human-understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the Knockoff samples to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our synthetic and real data experiments. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.

Statistically Significant Concept-based Explanation of Image Classifiers via Model Knockoffs

TL;DR

Concept-based explanations can mislead when false positives appear among interpretable factors. The authors propose a statistically grounded framework using Model-X Knockoff to select significant concepts while enforcing the false discovery rate at level , applicable to both unsupervised (CSR-VAE) and supervised (CSR-CBM) concept learning. They prove an FDR guarantee and validate the approach on synthetic data and real datasets (Colored MNIST, CelebA), demonstrating improved interpretability without sacrificing predictive performance. By coupling sparse concept learning with Knockoff-based selection, the method provides reliable, human-understandable explanations that enhance trust in deep image classifiers.

Abstract

A concept-based classifier can explain the decision process of a deep learning model by human-understandable concepts in image classification problems. However, sometimes concept-based explanations may cause false positives, which misregards unrelated concepts as important for the prediction task. Our goal is to find the statistically significant concept for classification to prevent misinterpretation. In this study, we propose a method using a deep learning model to learn the image concept and then using the Knockoff samples to select the important concepts for prediction by controlling the False Discovery Rate (FDR) under a certain value. We evaluate the proposed method in our synthetic and real data experiments. Also, it shows that our method can control the FDR properly while selecting highly interpretable concepts to improve the trustworthiness of the model.
Paper Structure (39 sections, 2 theorems, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 39 sections, 2 theorems, 11 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

candes2018panning Given $\bm{Y}$ and $\bm{X}$ following the linear model, let $\tilde{\bm{X}}$ be the Knockoff sample satisfying Eq. (modelx_d1) and Eq. (modelx_d2). Suppose $W_j$ is calculated by Eq. (modelx_2) from $\bm{Y}, \bm{X}$, and $\tilde{\bm{X}}$. Given a target FDR level $q$, we select var Then, the FDR of $\hat{\mathcal{S}}$ is controlled as the prescribed level, i.e.,

Figures (5)

  • Figure 1: Experiments results on synthetic data in unsupervised and supervised settings. The results are averaged under 10 independent trials.
  • Figure 2: Experiments results on real concepts in unsupervised and supervised settings. The results are averaged under 10 independent trials.
  • Figure 3: The prediction outcomes' change rate of reconstruction image by $\omega$ and feature selection results by different methods.
  • Figure 4: Demonstration of feature selection for digit classification in the unsupervised setting (Colored-MNIST).
  • Figure 5: Demonstration of feature selection results in the supervised setting (CelebA).

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2