Table of Contents
Fetching ...

Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space

Yukai Zhang, Ao Xu, Zihao Li, Tieru Wu

TL;DR

This work tackles the problem of generating faithful image counterfactual explanations for black-box image classifiers. It introduces a two-stage framework, combining Wasserstein-based feature-layer selection and fusion with Distribution Preference Mahalanobis Distance (DPMD) to produce counterfactuals in feature space, which are then mapped back to images via a GAN-based generator. The approach, termed DPMDCE, demonstrates improved fidelity in both latent (feature) and pixel spaces on MNIST and outperforms several baselines (Min-Edit, PIECE, CEM, Proto-CF) in key metrics. By explicitly modeling feature-space distributions and using distribution-aware distance, the method enhances interpretability and robustness of counterfactual explanations for black-box models, with practical implications for trusted AI deployments.

Abstract

In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.

Enhancing Counterfactual Image Generation Using Mahalanobis Distance with Distribution Preferences in Feature Space

TL;DR

This work tackles the problem of generating faithful image counterfactual explanations for black-box image classifiers. It introduces a two-stage framework, combining Wasserstein-based feature-layer selection and fusion with Distribution Preference Mahalanobis Distance (DPMD) to produce counterfactuals in feature space, which are then mapped back to images via a GAN-based generator. The approach, termed DPMDCE, demonstrates improved fidelity in both latent (feature) and pixel spaces on MNIST and outperforms several baselines (Min-Edit, PIECE, CEM, Proto-CF) in key metrics. By explicitly modeling feature-space distributions and using distribution-aware distance, the method enhances interpretability and robustness of counterfactual explanations for black-box models, with practical implications for trusted AI deployments.

Abstract

In the realm of Artificial Intelligence (AI), the importance of Explainable Artificial Intelligence (XAI) is increasingly recognized, particularly as AI models become more integral to our lives. One notable single-instance XAI approach is counterfactual explanation, which aids users in comprehending a model's decisions and offers guidance on altering these decisions. Specifically in the context of image classification models, effective image counterfactual explanations can significantly enhance user understanding. This paper introduces a novel method for computing feature importance within the feature space of a black-box model. By employing information fusion techniques, our method maximizes the use of data to address feature counterfactual explanations in the feature space. Subsequently, we utilize an image generation model to transform these feature counterfactual explanations into image counterfactual explanations. Our experiments demonstrate that the counterfactual explanations generated by our method closely resemble the original images in both pixel and feature spaces. Additionally, our method outperforms established baselines, achieving impressive experimental results.
Paper Structure (27 sections, 20 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 20 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: Our approach generates image counterfactual explanations that remain close to the original image in pixel space while also maintaining a sufficiently small distance from it in feature space.
  • Figure 2: For the black box and the Origan IMG predicted to be 9, we first select a number of layers in the black box that require feature fusion, perform fusion of input instance features and fusion of distribution information vectors, then use the two strategies we devised to find the most appropriate counterfactual explanation category, then solve for the optimal counterfactual explanations in the feature space, and finally find the optimal inputs for a generative module to obtain the Counterfactual IMG.
  • Figure 3: The process of finding Merge Modlue.
  • Figure 4: Distribution of data from data class 9 and data from data class 4 on the 10th neuron and 16th neuron, respectively.
  • Figure 5: The first row is a randomly selected image from the MNIST test set, and the other rows are counterfactual explanations of the images generated by different counterfactual explanation generating algorithms, with the name of the algorithms labeled in the first column, and the predictions of the generated images in the corresponding black-box model labeled directly below each image.
  • ...and 1 more figures

Theorems & Definitions (2)

  • definition thmcounterdefinition
  • definition thmcounterdefinition