Table of Contents
Fetching ...

FLEX: Feature Importance from Layered Counterfactual Explanations

Nawid Keshtmand, Roussel Desmond Nzoyem, Jeffrey Nicholas Clark

TL;DR

FLEX addresses the interpretability gap of black-box models by deriving feature importances from counterfactual explanations at local, regional, and global levels. It is model- and domain-agnostic and compatible with various counterfactual generators, incorporating a magnitude threshold to focus on substantively meaningful changes. Empirical results on traffic accident severity and loan allocation show that FLEX's global rankings align with SHAP while exposing region-specific drivers that global summaries miss, and that regional correlations carry meaningful variation across subpopulations. The framework also demonstrates computational efficiency advantages over Kernel SHAP and provides uncertainty estimates via feature-change frequencies, enabling targeted, context-aware recourse in risk-sensitive domains.

Abstract

Machine learning models achieve state-of-the-art performance across domains, yet their lack of interpretability limits safe deployment in high-stakes settings. Counterfactual explanations are widely used to provide actionable "what-if" recourse, but they typically remain instance-specific and do not quantify which features systematically drive outcome changes within coherent regions of the feature space or across an entire dataset. We introduce FLEX (Feature importance from Layered counterfactual EXplanations), a model- and domain-agnostic framework that converts sets of counterfactuals into feature change frequency scores at local, regional, and global levels. FLEX generalises local change-frequency measures by aggregating across instances and neighbourhoods, offering interpretable rankings that reflect how often each feature must change to flip predictions. The framework is compatible with different counterfactual generation methods, allowing users to emphasise characteristics such as sparsity, feasibility, or actionability, thereby tailoring the derived feature importances to practical constraints. We evaluate FLEX on two contrasting tabular tasks: traffic accident severity prediction and loan approval, and compare FLEX to SHAP- and LIME-derived feature importance values. Results show that (i) FLEX's global rankings correlate with SHAP while surfacing additional drivers, and (ii) regional analyses reveal context-specific factors that global summaries miss. FLEX thus bridges the gap between local recourse and global attribution, supporting transparent and intervention-oriented decision-making in risk-sensitive applications.

FLEX: Feature Importance from Layered Counterfactual Explanations

TL;DR

FLEX addresses the interpretability gap of black-box models by deriving feature importances from counterfactual explanations at local, regional, and global levels. It is model- and domain-agnostic and compatible with various counterfactual generators, incorporating a magnitude threshold to focus on substantively meaningful changes. Empirical results on traffic accident severity and loan allocation show that FLEX's global rankings align with SHAP while exposing region-specific drivers that global summaries miss, and that regional correlations carry meaningful variation across subpopulations. The framework also demonstrates computational efficiency advantages over Kernel SHAP and provides uncertainty estimates via feature-change frequencies, enabling targeted, context-aware recourse in risk-sensitive domains.

Abstract

Machine learning models achieve state-of-the-art performance across domains, yet their lack of interpretability limits safe deployment in high-stakes settings. Counterfactual explanations are widely used to provide actionable "what-if" recourse, but they typically remain instance-specific and do not quantify which features systematically drive outcome changes within coherent regions of the feature space or across an entire dataset. We introduce FLEX (Feature importance from Layered counterfactual EXplanations), a model- and domain-agnostic framework that converts sets of counterfactuals into feature change frequency scores at local, regional, and global levels. FLEX generalises local change-frequency measures by aggregating across instances and neighbourhoods, offering interpretable rankings that reflect how often each feature must change to flip predictions. The framework is compatible with different counterfactual generation methods, allowing users to emphasise characteristics such as sparsity, feasibility, or actionability, thereby tailoring the derived feature importances to practical constraints. We evaluate FLEX on two contrasting tabular tasks: traffic accident severity prediction and loan approval, and compare FLEX to SHAP- and LIME-derived feature importance values. Results show that (i) FLEX's global rankings correlate with SHAP while surfacing additional drivers, and (ii) regional analyses reveal context-specific factors that global summaries miss. FLEX thus bridges the gap between local recourse and global attribution, supporting transparent and intervention-oriented decision-making in risk-sensitive applications.

Paper Structure

This paper contains 20 sections, 6 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: Pipeline to calculate the FLEX score which involves having a factual (or set of factuals), generating counterfactuals, and observing feature changes.
  • Figure 2: Demonstration on a toy 2D dataset labelled as the accident severity task. (1) Train a binary classifier for accident severity. (2) Select high-severity instances (yellow) and their nearby low-severity counterfactuals (lilac, $N_{\text{cf}}=3$). (3) Regional case: select nearest neighbours around a point (green/pink regions) and their counterfactuals (lilac). (4) Global insights: randomly sample high-severity points across the space and identify counterfactuals. Feature change frequency is computed to quantify differences between factuals and counterfactuals
  • Figure 3: Example points A, B, C, and D are for illustration only: A indicates low global but high regional importance; B shows both are low; C indicates both are high; and D shows high global but low regional importance. The distance of each point from the line y = x reflects the consistency or deviation between global and regional importance.
  • Figure 4: Mean and standard deviation of local feature change frequencies computed for four contrastive regions, each formed by randomly selecting a factual instance and its four nearest factual neighbours with a specific categorical feature value (for example, driving experience more than 10 years for region 1) and using $10$ counterfactuals generated per factual instance. This is compared to importance scores from LIME which is normalized to be between 0 and 1.
  • Figure 5: Plotting the global feature change frequency values against those for each region. Pearson correlation coefficients ("r") are presented for each.
  • ...and 1 more figures