Table of Contents
Fetching ...

Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers

Divyat Mahajan, Chenhao Tan, Amit Sharma

TL;DR

The paper tackles the challenge of generating feasible counterfactual explanations for ML classifiers by treating feasibility as a causal constraint problem. It introduces a causal proximity loss based on structural causal models to preserve inter-feature relationships, and a VAE-based Example-Based CF that learns feasibility from user feedback. The methods are evaluated on BN, Sangiovese, and Adult datasets, showing improved feasibility over prior approaches while maintaining high target-class validity and offering faster generation. The work highlights the importance of incorporating causal structure and user input for actionable, real-world CF explanations with practical impact in sensitive domains.

Abstract

To construct interpretable explanations that are consistent with the original ML model, counterfactual examples---showing how the model's output changes with small perturbations to the input---have been proposed. This paper extends the work in counterfactual explanations by addressing the challenge of feasibility of such examples. For explanations of ML models in critical domains such as healthcare and finance, counterfactual examples are useful for an end-user only to the extent that perturbation of feature inputs is feasible in the real world. We formulate the problem of feasibility as preserving causal relationships among input features and present a method that uses (partial) structural causal models to generate actionable counterfactuals. When feasibility constraints cannot be easily expressed, we consider an alternative mechanism where people can label generated CF examples on feasibility: whether it is feasible to intervene and realize the candidate CF example from the original input. To learn from this labelled feasibility data, we propose a modified variational auto encoder loss for generating CF examples that optimizes for feasibility as people interact with its output. Our experiments on Bayesian networks and the widely used ''Adult-Income'' dataset show that our proposed methods can generate counterfactual explanations that better satisfy feasibility constraints than existing methods.. Code repository can be accessed here: \textit{https://github.com/divyat09/cf-feasibility}

Preserving Causal Constraints in Counterfactual Explanations for Machine Learning Classifiers

TL;DR

The paper tackles the challenge of generating feasible counterfactual explanations for ML classifiers by treating feasibility as a causal constraint problem. It introduces a causal proximity loss based on structural causal models to preserve inter-feature relationships, and a VAE-based Example-Based CF that learns feasibility from user feedback. The methods are evaluated on BN, Sangiovese, and Adult datasets, showing improved feasibility over prior approaches while maintaining high target-class validity and offering faster generation. The work highlights the importance of incorporating causal structure and user input for actionable, real-world CF explanations with practical impact in sensitive domains.

Abstract

To construct interpretable explanations that are consistent with the original ML model, counterfactual examples---showing how the model's output changes with small perturbations to the input---have been proposed. This paper extends the work in counterfactual explanations by addressing the challenge of feasibility of such examples. For explanations of ML models in critical domains such as healthcare and finance, counterfactual examples are useful for an end-user only to the extent that perturbation of feature inputs is feasible in the real world. We formulate the problem of feasibility as preserving causal relationships among input features and present a method that uses (partial) structural causal models to generate actionable counterfactuals. When feasibility constraints cannot be easily expressed, we consider an alternative mechanism where people can label generated CF examples on feasibility: whether it is feasible to intervene and realize the candidate CF example from the original input. To learn from this labelled feasibility data, we propose a modified variational auto encoder loss for generating CF examples that optimizes for feasibility as people interact with its output. Our experiments on Bayesian networks and the widely used ''Adult-Income'' dataset show that our proposed methods can generate counterfactual explanations that better satisfy feasibility constraints than existing methods.. Code repository can be accessed here: \textit{https://github.com/divyat09/cf-feasibility}

Paper Structure

This paper contains 33 sections, 2 theorems, 15 equations, 9 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

The evidence lower bound to optimize the CF objective $\Pr(\bm{x}^{cf}| y', \bm{x})$ is:

Figures (9)

  • Figure 1: Defining the proximity loss with SCM.
  • Figure 2: Constraint-Feasibility for three datasets, Causal-Edge score for BN1 and Sangiovese.
  • Figure 3: Constraint-Feasibility score as the no. of labelled examples is increased for global constraints in Adult.
  • Figure 4: SCM describing the true causal relationships between three input features: Income, House rent, and Savings of a person. These input features are used by a pre-trained black-box ML model that we wish to explain.
  • Figure 5: Validity, Continuous Proximity and Categorical Proximity metrics for different CF explanation methods.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 1
  • Theorem 2
  • proof