Faithful Counterfactual Visual Explanations (FCVE)
Bismillah Khan, Syed Ali Tariq, Tehseen Zia, Muhammad Ahsan, David Windridge
TL;DR
This paper tackles the opacity of deep vision models by proposing Faithful Counterfactual Visual Explanations (FCVE), a post-hoc method that reveals model reasoning through minimal, non-pixel changes grounded in internal filter concepts. Building on a prior Counterfactual Explanation (CFE) framework, FCVE identifies counterfactual and contrastive filters in the last convolutional layer and visualizes their effects via a decoder, yielding plausible and faithful counterfactual images. The approach is validated on MNIST and Fashion-MNIST, showing more realistic counterfactuals and improved quantitative metrics (Proximity and FID) compared with existing methods. Overall, FCVE demonstrates that explanations can be both interpretable to humans and faithful to the model’s internal decision processes, with potential applicability to broader domains and evaluation in future work.
Abstract
Deep learning models in computer vision have made remarkable progress, but their lack of transparency and interpretability remains a challenge. The development of explainable AI can enhance the understanding and performance of these models. However, existing techniques often struggle to provide convincing explanations that non-experts easily understand, and they cannot accurately identify models' intrinsic decision-making processes. To address these challenges, we propose to develop a counterfactual explanation (CE) model that balances plausibility and faithfulness. This model generates easy-to-understand visual explanations by making minimum changes necessary in images without altering the pixel data. Instead, the proposed method identifies internal concepts and filters learned by models and leverages them to produce plausible counterfactual explanations. The provided explanations reflect the internal decision-making process of the model, thus ensuring faithfulness to the model.
