Table of Contents
Fetching ...

Generating Counterfactual Explanations Using Cardinality Constraints

Rubén Ruiz-Torrubiano

TL;DR

The paper addresses the interpretability of counterfactual explanations by enforcing a sparsity constraint: limit modifications to at most $k$ features from the original instance. It extends the CERTIFAI framework with a cardinality constraint using a card_distance and a linear penalty $c_{card}$ to integrate the constraint into the objective while minimizing $d(\mathbf{x}, \hat{\mathbf{x}})$. Empirical results on the drug200 and Car Evaluation datasets show that constrained counterfactuals achieve lower cardinality (e.g., unconstrained average $\hat{k}$ around 3.1 on drug200; Car Evaluation around 1.6) and are more easily interpretable than unconstrained counterfactuals that alter many features. The approach is model-agnostic, operates on tabular data via a genetic algorithm, and offers a practical path toward more transparent explanations, with plans to enhance operators and validate on larger datasets.

Abstract

Providing explanations about how machine learning algorithms work and/or make particular predictions is one of the main tools that can be used to improve their trusworthiness, fairness and robustness. Among the most intuitive type of explanations are counterfactuals, which are examples that differ from a given point only in the prediction target and some set of features, presenting which features need to be changed in the original example to flip the prediction for that example. However, such counterfactuals can have many different features than the original example, making their interpretation difficult. In this paper, we propose to explicitly add a cardinality constraint to counterfactual generation limiting how many features can be different from the original example, thus providing more interpretable and easily understantable counterfactuals.

Generating Counterfactual Explanations Using Cardinality Constraints

TL;DR

The paper addresses the interpretability of counterfactual explanations by enforcing a sparsity constraint: limit modifications to at most features from the original instance. It extends the CERTIFAI framework with a cardinality constraint using a card_distance and a linear penalty to integrate the constraint into the objective while minimizing . Empirical results on the drug200 and Car Evaluation datasets show that constrained counterfactuals achieve lower cardinality (e.g., unconstrained average around 3.1 on drug200; Car Evaluation around 1.6) and are more easily interpretable than unconstrained counterfactuals that alter many features. The approach is model-agnostic, operates on tabular data via a genetic algorithm, and offers a practical path toward more transparent explanations, with plans to enhance operators and validate on larger datasets.

Abstract

Providing explanations about how machine learning algorithms work and/or make particular predictions is one of the main tools that can be used to improve their trusworthiness, fairness and robustness. Among the most intuitive type of explanations are counterfactuals, which are examples that differ from a given point only in the prediction target and some set of features, presenting which features need to be changed in the original example to flip the prediction for that example. However, such counterfactuals can have many different features than the original example, making their interpretation difficult. In this paper, we propose to explicitly add a cardinality constraint to counterfactual generation limiting how many features can be different from the original example, thus providing more interpretable and easily understantable counterfactuals.
Paper Structure (6 sections, 2 tables)