Enhancing Counterfactual Explanation Search with Diffusion Distance and Directional Coherence
Marharyta Domnich, Raul Vicente
TL;DR
This work addresses the need for human-centric explanations by introducing CoDiCE, a counterfactual explainer that injects two cognitive biases into the search: diffusion distance to favor feasible, manifold-consistent transitions and directional coherence to align joint feature changes with their marginal effects on the model output. These biases are integrated into a genetic-algorithm-based optimization objective that balances loss, diffusion proximity, sparsity, and coherence, with separate handling for continuous and categorical features. Across synthetic and real datasets with continuous and mixed-type features, CoDiCE variants demonstrate higher validity and more connected, coherent counterfactuals than baseline methods, illustrating the value of incorporating data geometry and human-aligned directional constraints. The findings reveal a trade-off between diffusion-distance and directional coherence, suggesting future work in multi-objective optimization to yield diverse, high-quality explanations that remain faithful to the data manifold and intuitive user expectations.
Abstract
A pressing issue in the adoption of AI models is the increasing demand for more human-centric explanations of their predictions. To advance towards more human-centric explanations, understanding how humans produce and select explanations has been beneficial. In this work, inspired by insights of human cognition we propose and test the incorporation of two novel biases to enhance the search for effective counterfactual explanations. Central to our methodology is the application of diffusion distance, which emphasizes data connectivity and actionability in the search for feasible counterfactual explanations. In particular, diffusion distance effectively weights more those points that are more interconnected by numerous short-length paths. This approach brings closely connected points nearer to each other, identifying a feasible path between them. We also introduce a directional coherence term that allows the expression of a preference for the alignment between the joint and marginal directional changes in feature space to reach a counterfactual. This term enables the generation of counterfactual explanations that align with a set of marginal predictions based on expectations of how the outcome of the model varies by changing one feature at a time. We evaluate our method, named Coherent Directional Counterfactual Explainer (CoDiCE), and the impact of the two novel biases against existing methods such as DiCE, FACE, Prototypes, and Growing Spheres. Through a series of ablation experiments on both synthetic and real datasets with continuous and mixed-type features, we demonstrate the effectiveness of our method.
