Table of Contents
Fetching ...

Supervised Feature Compression based on Counterfactual Analysis

Veronica Piccialli, Dolores Romero Morales, Cecilia Salvatore

TL;DR

Addressing the interpretability gap for black-box binary classifiers, this work introduces FCCA, a framework that uses counterfactual explanations to identify critical feature thresholds and construct a supervised discretization. The discretization enables training an optimal univariate decision tree that faithfully mirrors the target model's decision boundaries with tunable granularity via $Q$. Empirical results across diverse datasets demonstrate competitive accuracy, meaningful compression, and controllable sparsity, with compatibility across different target algorithms (e.g., Gradient Boosting, Random Forest, Linear SVM) and surrogates (CART, GOSDT). The approach provides a scalable path to compact, boundary-aligned, interpretable rules and opens avenues for fairness-oriented extensions and combinatorial optimization of discretizations.

Abstract

Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity.

Supervised Feature Compression based on Counterfactual Analysis

TL;DR

Addressing the interpretability gap for black-box binary classifiers, this work introduces FCCA, a framework that uses counterfactual explanations to identify critical feature thresholds and construct a supervised discretization. The discretization enables training an optimal univariate decision tree that faithfully mirrors the target model's decision boundaries with tunable granularity via . Empirical results across diverse datasets demonstrate competitive accuracy, meaningful compression, and controllable sparsity, with compatibility across different target algorithms (e.g., Gradient Boosting, Random Forest, Linear SVM) and surrogates (CART, GOSDT). The approach provides a scalable path to compact, boundary-aligned, interpretable rules and opens avenues for fairness-oriented extensions and combinatorial optimization of discretizations.

Abstract

Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is interpretable and compact. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity.
Paper Structure (13 sections, 10 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 10 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: Closeness of a Counterfactual Explanation to the Decision Boundaries of the Target model produced by a Random Forest projected on two features (Income and # years of credit).
  • Figure 2: Assume that $x^{CE}$ is computed by solving problem \ref{['eq:CE']} with $x^0$ as input for a given Tree Ensemble acting as a target model. A perturbation ($\pm \epsilon$) of the values of the features where $x^0$ and $x^{CE}$ differ can be used as splitting values for the nodes of a univariate Decision Tree.
  • Figure 3: Accuracy results on the benchmark datasets. We compare the performance of CART and GOSDT trained on the initial dataset with continuous features, the dataset discretized with the GTRE procedure, and the dataset discretized with the FCCA procedure. It is not possible to apply GOSDT directly to the dataset with continuous features. As reported in Table \ref{['tab:datasets']}, for datasets with few observations (boston, arrhythmia and ionosphere) the accuracy is computed in a $k$-fold crossvalidation, while for datasets with many observations (magic, particle and vehicle) the accuracy is computed as the average result of the $k$ classifiers trained in $k$-fold crossvalidation on the external test set.
  • Figure 4: Number of features used on the benchmark datasets. We compare the performance of CART and GOSDT trained on the initial dataset with continuous features, the dataset discretized with the GTRE procedure, and the dataset discretized with the FCCA procedure.
  • Figure 5: Compression rate on the benchmark datasets. We compare the performance of CART and GOSDT trained on the initial dataset with continuous features, the dataset discretized with the GTRE procedure, and the dataset discretized with the FCCA procedure.
  • ...and 9 more figures