Actionable and diverse counterfactual explanations incorporating domain knowledge and causal constraints
Szymon Bobek, Łukasz Bałec, Grzegorz J. Nalepa
TL;DR
This work tackles the problem of generating counterfactual explanations that are both actionable and plausible in real-world settings by embedding domain knowledge and causal constraints. The authors introduce DANCE, a framework that learns linear and nonlinear feature dependencies (via data-driven graphs or expert input) and optimizes a composite loss balancing fidelity, proximity, sparsity, diversity, and plausibility using Tree-structured Parzen Estimation. Comprehensive evaluation on 140 OpenML datasets and a real-world Freshmail case study demonstrates that DANCE often outperforms existing methods on key metrics while producing domain-consistent suggestions. The study highlights the practical value of incorporating domain constraints in XAI to improve adoption, trust, and impact in marketing security and beyond, with open-source code for reproducibility.
Abstract
Counterfactual explanations enhance the actionable interpretability of machine learning models by identifying the minimal changes required to achieve a desired outcome of the model. However, existing methods often ignore the complex dependencies in real-world datasets, leading to unrealistic or impractical modifications. Motivated by cybersecurity applications in the email marketing domain, we propose a method for generating Diverse, Actionable, and kNowledge-Constrained Explanations (DANCE), which incorporates feature dependencies and causal constraints to ensure plausibility and real-world feasibility of counterfactuals. Our method learns linear and nonlinear constraints from data or integrates expert-provided dependency graphs, ensuring counterfactuals are plausible and actionable. By maintaining consistency with feature relationships, the method produces explanations that align with real-world constraints. Additionally, it balances plausibility, diversity, and sparsity, effectively addressing key limitations in existing algorithms. The work is developed based on a real-life case study with Freshmail, the largest email marketing company in Poland and supported by a joint R&D project Sendguard. Furthermore, we provide an extensive evaluation using 140 public datasets, which highlights its ability to generate meaningful, domain-relevant counterfactuals that outperform other existing approaches based on widely used metrics. The source code for reproduction of the results can be found in a GitHub repository we provide.
