Table of Contents
Fetching ...

Faithful Counterfactual Visual Explanations (FCVE)

Bismillah Khan, Syed Ali Tariq, Tehseen Zia, Muhammad Ahsan, David Windridge

TL;DR

This paper tackles the opacity of deep vision models by proposing Faithful Counterfactual Visual Explanations (FCVE), a post-hoc method that reveals model reasoning through minimal, non-pixel changes grounded in internal filter concepts. Building on a prior Counterfactual Explanation (CFE) framework, FCVE identifies counterfactual and contrastive filters in the last convolutional layer and visualizes their effects via a decoder, yielding plausible and faithful counterfactual images. The approach is validated on MNIST and Fashion-MNIST, showing more realistic counterfactuals and improved quantitative metrics (Proximity and FID) compared with existing methods. Overall, FCVE demonstrates that explanations can be both interpretable to humans and faithful to the model’s internal decision processes, with potential applicability to broader domains and evaluation in future work.

Abstract

Deep learning models in computer vision have made remarkable progress, but their lack of transparency and interpretability remains a challenge. The development of explainable AI can enhance the understanding and performance of these models. However, existing techniques often struggle to provide convincing explanations that non-experts easily understand, and they cannot accurately identify models' intrinsic decision-making processes. To address these challenges, we propose to develop a counterfactual explanation (CE) model that balances plausibility and faithfulness. This model generates easy-to-understand visual explanations by making minimum changes necessary in images without altering the pixel data. Instead, the proposed method identifies internal concepts and filters learned by models and leverages them to produce plausible counterfactual explanations. The provided explanations reflect the internal decision-making process of the model, thus ensuring faithfulness to the model.

Faithful Counterfactual Visual Explanations (FCVE)

TL;DR

This paper tackles the opacity of deep vision models by proposing Faithful Counterfactual Visual Explanations (FCVE), a post-hoc method that reveals model reasoning through minimal, non-pixel changes grounded in internal filter concepts. Building on a prior Counterfactual Explanation (CFE) framework, FCVE identifies counterfactual and contrastive filters in the last convolutional layer and visualizes their effects via a decoder, yielding plausible and faithful counterfactual images. The approach is validated on MNIST and Fashion-MNIST, showing more realistic counterfactuals and improved quantitative metrics (Proximity and FID) compared with existing methods. Overall, FCVE demonstrates that explanations can be both interpretable to humans and faithful to the model’s internal decision processes, with potential applicability to broader domains and evaluation in future work.

Abstract

Deep learning models in computer vision have made remarkable progress, but their lack of transparency and interpretability remains a challenge. The development of explainable AI can enhance the understanding and performance of these models. However, existing techniques often struggle to provide convincing explanations that non-experts easily understand, and they cannot accurately identify models' intrinsic decision-making processes. To address these challenges, we propose to develop a counterfactual explanation (CE) model that balances plausibility and faithfulness. This model generates easy-to-understand visual explanations by making minimum changes necessary in images without altering the pixel data. Instead, the proposed method identifies internal concepts and filters learned by models and leverages them to produce plausible counterfactual explanations. The provided explanations reflect the internal decision-making process of the model, thus ensuring faithfulness to the model.
Paper Structure (14 sections, 10 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 10 equations, 7 figures, 1 table, 1 algorithm.

Figures (7)

  • Figure 1: Block diagram of the proposed visual counterfactual explanation model. The proposed method consists of two steps: first is the identification of contrastive and counterfactual filters to explain classifier's decisions, followed by the visualization of these filters by generating images with the modified activations. The decoder is initially trained with all filters intact to recreate the input, so that when the encoder's output is altered using the identified filters, their effect is visualized in the recreated image.
  • Figure 2: Visual comparison of counterfactual explanation methods. The first column shows the query images from MNIST and FMNIST, while the other five columns display the counterfactuals generated by ExpGAN samangouei2018explaingan, CEM dhurandhar2018explanations, CVE pmlr-v97-goyal19a, C3LT khorram2022cycle, and our proposed model (FCVE), respectively. The proposed method generates counterfactuals by manipulating the internal activations of the model, resulting in counterfactuals that are more meaningful and realistic compared to other methods.
  • Figure 3: Plausible counterfactuals generated for digit seven as a source class and digit nine as target class. The proposed method finds the minimal changes to neuron activations such that the input of one class is transformed into another.
  • Figure 4: Counterfactuals generated for random source and target classes of MNIST dataset. Similar to Fig. \ref{['fig:3']}, the proposed method finds the minimal changes to neuron activations such that the input of one class is transformed into another.
  • Figure 5: Plausible counterfactuals generated for the FMNIST dataset. The first row is the source class “Pullover”. Second and third rows are target classes of “Dress” and “Coat” into which the source image is transformed into by altering the filter activations.
  • ...and 2 more figures