Table of Contents
Fetching ...

Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers

Syed Ali Tariq, Tehseen Zia, Mubeen Ghafoor

TL;DR

This paper tackles the challenge of explainability for DCNN image classifiers by proposing a model-intrusive framework that exposes the internal reasoning of networks through counterfactual and contrastive explanations. It introduces two predictive models that identify minimum correct (MC) and minimum incorrect (MI) filters from the top convolutional layer, enabling visualizable concepts and misclassification analysis via receptive-field mappings. The approach yields interpretable explanations by muting or augmenting specific filters and demonstrates its utility on the CUB dataset with a VGG-16 backbone, including qualitative visualizations, a user study, and quantitative analyses showing meaningful breakdowns of class-specific features and trade-offs between sparsity and accuracy. The work advances transparency and trust in DCNNs for high-stakes applications and suggests future improvements in evaluation metrics and broader applicability to debugging and teaching tasks.

Abstract

Explainability of deep convolutional neural networks (DCNNs) is an important research topic that tries to uncover the reasons behind a DCNN model's decisions and improve their understanding and reliability in high-risk environments. In this regard, we propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models. The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image to generate explanations. Given an input image, we provide contrastive explanations by identifying the most important filters in the DCNN representing features and concepts that separate the model's decision between classifying the image to the original inferred class or some other specified alter class. On the other hand, we provide counterfactual explanations by specifying the minimal changes necessary in such filters so that a contrastive output is obtained. Using these identified filters and concepts, our method can provide contrastive and counterfactual reasons behind a model's decisions and makes the model more transparent. One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model's decisions. The proposed method is compared with state-of-the-art and evaluated on the Caltech-UCSD Birds (CUB) 2011 dataset to show the usefulness of the explanations provided.

Towards Counterfactual and Contrastive Explainability and Transparency of DCNN Image Classifiers

TL;DR

This paper tackles the challenge of explainability for DCNN image classifiers by proposing a model-intrusive framework that exposes the internal reasoning of networks through counterfactual and contrastive explanations. It introduces two predictive models that identify minimum correct (MC) and minimum incorrect (MI) filters from the top convolutional layer, enabling visualizable concepts and misclassification analysis via receptive-field mappings. The approach yields interpretable explanations by muting or augmenting specific filters and demonstrates its utility on the CUB dataset with a VGG-16 backbone, including qualitative visualizations, a user study, and quantitative analyses showing meaningful breakdowns of class-specific features and trade-offs between sparsity and accuracy. The work advances transparency and trust in DCNNs for high-stakes applications and suggests future improvements in evaluation metrics and broader applicability to debugging and teaching tasks.

Abstract

Explainability of deep convolutional neural networks (DCNNs) is an important research topic that tries to uncover the reasons behind a DCNN model's decisions and improve their understanding and reliability in high-risk environments. In this regard, we propose a novel method for generating interpretable counterfactual and contrastive explanations for DCNN models. The proposed method is model intrusive that probes the internal workings of a DCNN instead of altering the input image to generate explanations. Given an input image, we provide contrastive explanations by identifying the most important filters in the DCNN representing features and concepts that separate the model's decision between classifying the image to the original inferred class or some other specified alter class. On the other hand, we provide counterfactual explanations by specifying the minimal changes necessary in such filters so that a contrastive output is obtained. Using these identified filters and concepts, our method can provide contrastive and counterfactual reasons behind a model's decisions and makes the model more transparent. One of the interesting applications of this method is misclassification analysis, where we compare the identified concepts from a particular input image and compare them with class-specific concepts to establish the validity of the model's decisions. The proposed method is compared with state-of-the-art and evaluated on the Caltech-UCSD Birds (CUB) 2011 dataset to show the usefulness of the explanations provided.
Paper Structure (18 sections, 9 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 9 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Explanation provided by the proposed approach. Our method identifies the most important minimum correct (MC) and minimum incorrect (MI) filters using which the pre-trained model either predicts the input image to its original inferred class or some chosen alter class, respectively. The top-3 MC filters for the example image classified as "Red-winged blackbird" show that the red spot on this bird's wing is the most discriminating feature for it. Whereas, the top-3 MI filters of this example for the "Bronzed cowbird" class show that if the filters corresponding to features such as bird's red eyes and the blue-tinged feather were present in the input, the model would have been more likely to predict this image as "Bronzed cowbird."
  • Figure 2: Overall block diagram of the proposed counterfactual and contrastive explanation model. Given an input image, the contrastive and counterfactual filter generation networks predict the MC and MI filter maps. MC filters are multiplied with the pre-classification output of the pre-trained network to disable all but the important features using which the model is able to maintain prediction to the original inferred class. Similarly, MI filter map is used to alter the activation magnitudes of the pre-trained model such that the model predicts the image to some alter class $c'$.
  • Figure 3: Contrastive and counterfactual explanations for a sample image from CUB dataset. Contrastive explanation highlights the top-3 MC filters representing features important for the inferred class. Counterfactual explanation highlights top-3 MI filters that represent features required for the alter class.
  • Figure 4: Artificially introducing the most important features relevant to the alter class identified in Fig. \ref{['fig:results_1']} (Bronzed cowbird). (a) Adding eye color. (b) Adding wing pattern. (c) Adding both eye color and wing pattern.
  • Figure 5: Explanation comparison of GradCAM selvaraju2017grad, SCOUT wang2020scout, and proposed CFE method for a query image that was classified as "Bronzed cowbird", while the alter class is set as "Red-winged blackbird". (a) GradCAM explanation for the inferred class (Bronzed cowbird). (b) GradCAM explanation for the alter class. (c) SCOUT explanation for why the image is classified to inferred class and not to alter class. (d) Contrastive explanation of the proposed CFE method. (e) Counterfactual explanation of the proposed CFE method.
  • ...and 4 more figures