Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment

Théo Verhelst; Denis Mercier; Jeevan Shrestha; Gianluca Bontempi

Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment

Théo Verhelst, Denis Mercier, Jeevan Shrestha, Gianluca Bontempi

TL;DR

The paper develops uplift-based bounds for the probability of counterfactual events under unconfoundedness, extending Fréchet bounds to the four counterfactual joint outcomes through S0 and S1. It introduces a point estimator for counterfactual probabilities assuming Y0 and Y1 are conditionally independent given X, and provides a hierarchical Bayesian simulator for validation. Through simulation, it shows the uplift bounds are tighter than Fréchet bounds and demonstrates reasonable estimator accuracy; it then validates the approach on a real telecom churn dataset, highlighting practical business insights and limitations. The work offers a practical framework for partial counterfactual identification using uplift modeling, with potential for refinement via more informative features and observational data.

Abstract

Counterfactuals are central in causal human reasoning and the scientific discovery process. The uplift, also called conditional average treatment effect, measures the causal effect of some action, or treatment, on the outcome of an individual. This paper discusses how it is possible to derive bounds on the probability of counterfactual statements based on uplift terms. First, we derive some original bounds on the probability of counterfactuals and we show that tightness of such bounds depends on the information of the feature set on the uplift term. Then, we propose a point estimator based on the assumption of conditional independence between the counterfactual outcomes. The quality of the bounds and the point estimators are assessed on synthetic data and a large real-world customer data set provided by a telecom company, showing significant improvement over the state of the art.

Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 40 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 19 sections, 2 theorems, 40 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related work
Notation
Bounds on the probability of counterfactuals
Probability bounds and uplift estimation
Point estimate of counterfactual probabilities
Point estimate and uplift estimation
Bounds assessment by simulation
Methodology
Simulation parameters
Assessment of the theoretical results
Sensitivity analysis of the simulation
Evaluation with real data
Data set description
Methodology
...and 4 more sections

Key Result

Theorem 1

As the conditional entropy $H(Y_0,Y_1\mid X)$ approaches zero, the uplift bounds on the probability $P(Y_0=y_0,Y_1=y_1)$ collapse to the exact value of that probability. Conversely, as the conditional entropy $H(Y_0,Y_1\mid X)$ approaches the entropy $H(Y_0,Y_1)$, the uplift bounds reduce to the Fré

Figures (8)

Figure 1: The estimator $\hat{\alpha}$, the true value of $\alpha$, and the bounds on $\alpha$, for different values of $\alpha$. We take the average over all experiments where $\alpha$ falls into the relevant range. The graph is quite similar for $\beta,\gamma$ and $\delta$.
Figure 2: Distribution of the point estimator bias, $\mathbb E[\phi^{(i)}]$, over 4000 simulation runs. Note that this is different from the distribution of $\phi^{(i)}$ in a given simulation run. Although the maximum is around zero, it is never exactly zero, indicating that the estimators are biased in our simulations. This is desirable to reflect violations of the hypotheses underlying our estimators in practical scenarios.
Figure 3: The bounds span as a function of the conditional entropy of $Y_0^{(i)},Y_1^{(i)}$, which is directly influenced by the parameter $A$. We fixed $(\alpha,\beta,\gamma,\delta)=(0.947,0.020,0.017,0.017)$, and $v=50$ and $N=2000$.
Figure 4: The error of the point estimator as a function of the number of samples in the evaluation data set. We fixed $(\alpha,\beta,\gamma,\delta)=(0.947,0.020,0.017,0.017)$, and $v=20$ and $A=1$.
Figure 5: The error of the point estimator as a function of model variance $\mathop{\mathrm{Var}}\nolimits(\widehat{S}_t^{(i)})$. We fixed $(\alpha,\beta,\gamma,\delta)=(0.947,0.020,0.017,0.017)$, and $N=1000$ and $A=10$. As the variance decreases, the estimator bias converges towards to its theoretical value.
...and 3 more figures

Theorems & Definitions (5)

Definition 1: Unconfoundedness
Theorem 1
proof
Theorem 2
proof

Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment

TL;DR

Abstract

Partial counterfactual identification and uplift modeling: theoretical results and real-world assessment

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (5)