Refining Counterfactual Explanations With Joint-Distribution-Informed Shapley Towards Actionable Minimality
Lei You, Yijun Bian, Lele Cao
TL;DR
This work addresses the problem of generating actionable minimal counterfactual explanations by reducing unnecessary feature changes without sacrificing validity. It introduces a versatile COLA framework that couples counterfactual generation with joint-distribution-informed Shapley attributions, leveraging an optimal-transport plan to align factual and counterfactual data. Theoretical results bound the counterfactual effect by the transport cost under a Lipschitz model, and empirical results show substantial reductions in required actions while preserving impact across diverse datasets and models. The findings emphasize that a carefully learned joint distribution, rather than exact instance-level alignments, yields more effective and actionable explanations with broad practical implications for trustworthy, scalable AI interpretability.
Abstract
Counterfactual explanations (CE) identify data points that closely resemble the observed data but produce different machine learning (ML) model outputs, offering critical insights into model decisions. Despite the diverse scenarios, goals and tasks to which they are tailored, existing CE methods often lack actionable efficiency because of unnecessary feature changes included within the explanations that are presented to users and stakeholders. We address this problem by proposing a method that minimizes the required feature changes while maintaining the validity of CE, without imposing restrictions on models or CE algorithms, whether instance- or group-based. The key innovation lies in computing a joint distribution between observed and counterfactual data and leveraging it to inform Shapley values for feature attributions (FA). We demonstrate that optimal transport (OT) effectively derives this distribution, especially when the alignment between observed and counterfactual data is unclear in used CE methods. Additionally, a counterintuitive finding is uncovered: it may be misleading to rely on an exact alignment defined by the CE generation mechanism in conducting FA. Our proposed method is validated on extensive experiments across multiple datasets, showcasing its effectiveness in refining CE towards greater actionable efficiency.
