Table of Contents
Fetching ...

CounterFlowNet: From Minimal Changes to Meaningful Counterfactual Explanations

Oleksii Furman, Patryk Marszałek, Jan Masłowski, Piotr Gaiński, Maciej Zięba, Marek Śmieja

TL;DR

Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.

Abstract

Counterfactual explanations (CFs) provide human-interpretable insights into model's predictions by identifying minimal changes to input features that would alter the model's output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.

CounterFlowNet: From Minimal Changes to Meaningful Counterfactual Explanations

TL;DR

Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.

Abstract

Counterfactual explanations (CFs) provide human-interpretable insights into model's predictions by identifying minimal changes to input features that would alter the model's output. However, existing methods struggle to generate multiple high-quality explanations that (1) affect only a small portion of the features, (2) can be applied to tabular data with heterogeneous features, and (3) are consistent with the user-defined constraints. We propose CounterFlowNet, a generative approach that formulates CF generation as sequential feature modification using conditional Generative Flow Networks (GFlowNet). CounterFlowNet is trained to sample CFs proportionally to a user-specified reward function that can encode key CF desiderata: validity, sparsity, proximity and plausibility, encouraging high-quality explanations. The sequential formulation yields highly sparse edits, while a unified action space seamlessly supports continuous and categorical features. Moreover, actionability constraints, such as immutability and monotonicity of features, can be enforced at inference time via action masking, without retraining. Experiments on eight datasets under two evaluation protocols demonstrate that CounterFlowNet achieves superior trade-offs between validity, sparsity, plausibility, and diversity with full satisfaction of the given constraints.
Paper Structure (35 sections, 18 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 35 sections, 18 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: CounterFlowNet generates multiple counterfactual explanations by framing feature modification as a sequential decision process. Given an original data point (left), CounterFlowNet samples multiple valid CFs (CF1–CF3) proportional to a composite reward, naturally balancing sparsity, proximity, plausibility, and validity without requiring separate optimization.
  • Figure 2: Overview of the CounterFlowNet generative process. Top: The forward policy $P_F$ constructs a counterfactual $\mathbf{x}'$ from source instance $\mathbf{x}_0$ through sequential feature modifications until a stop action terminates the trajectory. Bottom: Each action is factorized into two stages: sampling a feature index $d_t \sim P_F(d \mid s_t, \mathbf{x}_0, y')$ (Stage 1, Eq. \ref{['eq:index_policy']}), then sampling its new value $v_t \sim P_F(v \mid d_t, s_t, \mathbf{x}_0, y')$ (Stage 2, Eq. \ref{['eq:value_policy']}).
  • Figure 3: Effect of discretization granularity $B$ on Adult dataset.
  • Figure 4: Reward component ablation on the Adult dataset. (a) Increasing plausibility weight $\lambda_p$ improves LOF at the cost of proximity. (b-c) Higher proximity weight $\lambda_d$ yields sparser, more plausible CFs with lower diversity. All CounterFlowNet configurations consistently dominate the baseline Pareto frontiers.
  • Figure 5: Ablation of $\lambda_p$ and $\lambda_d$ with fixed weights for other reward weights ($\lambda_v=1.0,\lambda_s=0.4$)
  • ...and 1 more figures