Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space
Yukai Zhang, Ao Xu, Zihao Li, Tieru Wu
TL;DR
This work tackles the problem of generating faithful image counterfactual explanations for black-box image classifiers. It introduces a two-stage framework, combining Wasserstein-based feature-layer selection and fusion with Distribution Preference Mahalanobis Distance (DPMD) to produce counterfactuals in feature space, which are then mapped back to images via a GAN-based generator. The approach, termed DPMDCE, demonstrates improved fidelity in both latent (feature) and pixel spaces on MNIST and outperforms several baselines (Min-Edit, PIECE, CEM, Proto-CF) in key metrics. By explicitly modeling feature-space distributions and using distribution-aware distance, the method enhances interpretability and robustness of counterfactual explanations for black-box models, with practical implications for trusted AI deployments.
Abstract
In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.
