Table of Contents
Fetching ...

Flexible Counterfactual Explanations with Generative Models

Stig Hellemans, Andres Algaba, Sam Verboven, Vincent Ginis

TL;DR

FCEGAN addresses rigidity in counterfactual explanations by introducing counterfactual templates that let users dynamically specify mutability of features, while enabling black-box operation through historical predictions. The framework combines a GAN-based generator with dual discriminators and a divergence term, integrated with gradient-guided optimization, to produce realistic, valid counterfactuals aligned with user constraints. Experiments on healthcare and finance datasets show improved validity and usable explanations under varying flexibility, albeit with some diversity trade-offs that can be mitigated with divergence controls. The approach offers practical, personalized explanations in real-world, constraint-heterogeneous settings, without requiring retraining or model access, supporting deployment in high-stakes domains and regulated environments.

Abstract

Counterfactual explanations provide actionable insights to achieve desired outcomes by suggesting minimal changes to input features. However, existing methods rely on fixed sets of mutable features, which makes counterfactual explanations inflexible for users with heterogeneous real-world constraints. Here, we introduce Flexible Counterfactual Explanations, a framework incorporating counterfactual templates, which allows users to dynamically specify mutable features at inference time. In our implementation, we use Generative Adversarial Networks (FCEGAN), which align explanations with user-defined constraints without requiring model retraining or additional optimization. Furthermore, FCEGAN is designed for black-box scenarios, leveraging historical prediction datasets to generate explanations without direct access to model internals. Experiments across economic and healthcare datasets demonstrate that FCEGAN significantly improves counterfactual explanations' validity compared to traditional benchmark methods. By integrating user-driven flexibility and black-box compatibility, counterfactual templates support personalized explanations tailored to user constraints.

Flexible Counterfactual Explanations with Generative Models

TL;DR

FCEGAN addresses rigidity in counterfactual explanations by introducing counterfactual templates that let users dynamically specify mutability of features, while enabling black-box operation through historical predictions. The framework combines a GAN-based generator with dual discriminators and a divergence term, integrated with gradient-guided optimization, to produce realistic, valid counterfactuals aligned with user constraints. Experiments on healthcare and finance datasets show improved validity and usable explanations under varying flexibility, albeit with some diversity trade-offs that can be mitigated with divergence controls. The approach offers practical, personalized explanations in real-world, constraint-heterogeneous settings, without requiring retraining or model access, supporting deployment in high-stakes domains and regulated environments.

Abstract

Counterfactual explanations provide actionable insights to achieve desired outcomes by suggesting minimal changes to input features. However, existing methods rely on fixed sets of mutable features, which makes counterfactual explanations inflexible for users with heterogeneous real-world constraints. Here, we introduce Flexible Counterfactual Explanations, a framework incorporating counterfactual templates, which allows users to dynamically specify mutable features at inference time. In our implementation, we use Generative Adversarial Networks (FCEGAN), which align explanations with user-defined constraints without requiring model retraining or additional optimization. Furthermore, FCEGAN is designed for black-box scenarios, leveraging historical prediction datasets to generate explanations without direct access to model internals. Experiments across economic and healthcare datasets demonstrate that FCEGAN significantly improves counterfactual explanations' validity compared to traditional benchmark methods. By integrating user-driven flexibility and black-box compatibility, counterfactual templates support personalized explanations tailored to user constraints.

Paper Structure

This paper contains 18 sections, 1 equation, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Flexible Counterfactual Explanations GAN (FCEGAN). This diagram illustrates an individual denied a loan by a classification model based on input features such as age, work type, and sex. Users receiving an unfavorable prediction can explore which changes to their features might result in a favorable outcome. Unlike existing methods, FCEGAN empowers users to specify which features are mutable via a counterfactual template, providing flexibility in determining actionable changes. The model generates counterfactual explanations based on user preferences without requiring retraining, allowing individuals to identify and experiment with the most suitable feature modifications to achieve their desired prediction.
  • Figure 2: FCEGAN Architecture. The users' original features $x_{og}$, concatenated with their (undesired) predictions, are input to the model. Normally the user would select which features are set mutable in the counterfactual template. During training, counterfactual templates $x_{tmp}$ are generated by randomly setting a fraction of features as mutable, along with specifying a desired target $y_{desired}$, which is especially important in multi-class settings. The original instance $x_{og}$ and counterfactual template $x_{tmp}$ are fed into the flexible counterfactual generator, which outputs the predicted counterfactuals $x_{cf}$. To ensure realism, a combination of two discriminator losses is used: $L_{D_{og}}$, comparing $x_{cf}$ with the original instance $x_{og}$, and $L_{D_{cf}}$, comparing $x_{cf}$ with real counterfactuals from the desired class $x_{desired}$. To limit divergence, a divergence loss $L_{div}$ is applied, computed separately for mutable and immutable features. An optional classifier loss $L_{clas}$ can guide the generator to produce valid counterfactuals aligned with the desired target.
  • Figure 3: Performance of Counterfactual Explanations with Increasing Flexibility. As flexibility increases, a larger proportion of counterfactuals are valid, meaning they achieve the desired class prediction. This is crucial for improving search efficiency, as fewer explanations are discarded due to invalidity. Counterfactual templates substantially enhance this validity across all counterfactual methods compared to their default implementations. Another striking observation is that, when all features are set as mutable, the black-box FCEGAN significantly outperforms its default implementation, demonstrating its capability to enhance performance even in a non-flexible setting. All methods outperform random search (Random Input) by leveraging learned counterfactual characteristics, thereby improving search efficiency. No divergence constraints were applied in this comparison, as they were deemed unnecessary for the analysis. Other parameters: $\lambda_{clas} = (1/0)$, $\lambda_{D_{og}} = 0.5$, $\lambda_{D_{cf}} = 0.5$.
  • Figure 4: Impact of Counterfactual Templates on Counterfactual Quality. The use of counterfactual templates notably increases the fraction of valid counterfactuals, as previously established. This figure further examines the associated changes in quality measures, as described in Section \ref{['subsec:quality measures']}. Both categorical divergence (changes in categories) and continuous divergence (mean/max percentile shifts) generally increase, though not uniformly. This increase in divergence arises because the model is constrained by immutable features, leading to greater variation in mutable features. The model effectively learns which features can be altered. Fakeness remains largely unchanged, except for certain methods applied to the Adult datasets, where it either increases or decreases depending on the approach. In contrast, counterfactual templates in FCEGAN models notably reduce diversity, which is an undesirable outcome. However, enforcing stricter divergence constraints can help preserve diversity, as shown in Appendix Fig. \ref{['fig:divergence-impact']}. The presented measures represent the mean area under the curve (AUC) values of the flexibility graphs, as shown in Fig. \ref{['fig:flexibility']}, across five experiments. Error bars indicate the standard error of the mean (SEM). These values are normalized relative to their default implementations. No divergence constraints were applied during this comparison. $\lambda_{clas} = (1/0)$, $\lambda_{D_{og}} = 0.5$, $\lambda_{D_{cf}} = 0.5$.
  • Figure C1: Impact of Divergence Constraints on FCEGAN. This figure illustrates the effect of applying a divergence constraint through a divergence loss (Eq. \ref{['eq:div-loss']}). Two levels of constraints, labeled as small and large, are evaluated to demonstrate the impact of increasingly stringent divergence constraints. Generally, the divergence constraint reduces divergence measures, affecting both categorical features (e.g., changes in categories) and continuous features (e.g., mean/max percentile shifts). Additionally, a divergence constraint can enhance diversity, which is decreased in template based methods, by mitigating mode collapse. The fraction of valid counterfactuals decreases as the divergence constraint becomes more stringent, reflecting the tighter constraints on the model. Notably, the level of fakeness remains largely unaffected across experiments, except for an anomaly observed in the Adult dataset under a small constraint. The presented measures represent the mean area under the curve (AUC) values of the flexibility graphs, as shown in Fig. \ref{['fig:flexibility']}, across five experiments. Error bars indicate the standard error of the mean (SEM). These values are normalized relative to implementations without divergence. Parameters used: $\lambda_{clas} = (1/0)$, $\lambda_{D_{og}} = 0.5$, $\lambda_{D_{cf}} = 0.5$. Mutable divergence influence ($\lambda_{m}$) values per dataset were as follows: small constraint with classifier: 10; large constraint with classifier: 100; small constraint in black-box setting: 5; and large constraint in black-box setting: 50.
  • ...and 8 more figures