Table of Contents
Fetching ...

Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

Marharyta Domnich, Raul Vicente

TL;DR

This work addresses the need for human-centric explanations by introducing CoDiCE, a counterfactual explainer that injects two cognitive biases into the search: diffusion distance to favor feasible, manifold-consistent transitions and directional coherence to align joint feature changes with their marginal effects on the model output. These biases are integrated into a genetic-algorithm-based optimization objective that balances loss, diffusion proximity, sparsity, and coherence, with separate handling for continuous and categorical features. Across synthetic and real datasets with continuous and mixed-type features, CoDiCE variants demonstrate higher validity and more connected, coherent counterfactuals than baseline methods, illustrating the value of incorporating data geometry and human-aligned directional constraints. The findings reveal a trade-off between diffusion-distance and directional coherence, suggesting future work in multi-objective optimization to yield diverse, high-quality explanations that remain faithful to the data manifold and intuitive user expectations.

Abstract

A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.

Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence

TL;DR

This work addresses the need for human-centric explanations by introducing CoDiCE, a counterfactual explainer that injects two cognitive biases into the search: diffusion distance to favor feasible, manifold-consistent transitions and directional coherence to align joint feature changes with their marginal effects on the model output. These biases are integrated into a genetic-algorithm-based optimization objective that balances loss, diffusion proximity, sparsity, and coherence, with separate handling for continuous and categorical features. Across synthetic and real datasets with continuous and mixed-type features, CoDiCE variants demonstrate higher validity and more connected, coherent counterfactuals than baseline methods, illustrating the value of incorporating data geometry and human-aligned directional constraints. The findings reveal a trade-off between diffusion-distance and directional coherence, suggesting future work in multi-objective optimization to yield diverse, high-quality explanations that remain faithful to the data manifold and intuitive user expectations.

Abstract

A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.
Paper Structure (21 sections, 13 equations, 5 figures, 4 tables, 3 algorithms)

This paper contains 21 sections, 13 equations, 5 figures, 4 tables, 3 algorithms.

Figures (5)

  • Figure 1: Illustration of the concept of diffusion distance and its use for counterfactual search. Left panel: Points connected by numerous short distance paths (A-C) exhibit a shorter diffusion distance than pairs of points which connections pass through a bottleneck or low density region (A-B). Note that evaluated by Euclidean distance the pairwise distance A-C and A-B would be exactly the same. Right panel: 3D S-shaped synthetic dataset with two classes. The input point belongs to class 0, the diffusion distances between such point and 6 counterfactual candidates are displayed.
  • Figure 2: Illustration of Directional Coherence. The input point belongs to Class 1. Given counterfactual candidates $CF_1$ and $CF_2$ at equal distance from the original input point, we deem $CF_1$ as incoherent with respect to the expected effect of changing Feature 1. Intuitively, $CF_1$ suggests to decrease Feature 1, while the effect of increasing either Feature 1 or Feature 2 is to increase the posterior probability of predicting Class 2. For the other counterfactual ($CF_2$), there is an agreement between the direction of marginal changes (changing one feature at a time) and the joint direction of changes resulting in a more coherent counterfactual point.
  • Figure 3: Counterfactual search on synthetic datasets. The counterfactuals obtained for the S surface and Swiss roll illustrate the role of diffusion distance (panels b and d) to take into account the connectivity of the data manifold as opposed to $L_{1}$ (panels a and b) or more generally other $L_{p}$ distances.
  • Figure 4: Counterfactual search on the Diabetes dataset projected onto PCA coordinates (a) and t-SNE (b). The plot has triplets of points connected by doted lines Input instance (blue circle) their respective Counterfactual point obtained with diffusion distance (green triangle) and Counterfactual point obtained with weighted $L_1$ distance (red square).
  • Figure 5: Trade-off between diffusion distance and directional coherence penalty is explored as the diffusion weight is increased.