Identifying counterfactual probabilities using bivariate distributions and uplift modeling
Théo Verhelst, Gianluca Bontempi
TL;DR
This work tackles the problem of recovering the joint counterfactual distribution of potential outcomes, beyond standard uplift estimates, by embedding uplift scores into a Bayesian framework using a bivariate beta prior. It introduces a posterior over the four counterfactual probabilities p_ij via latent Dirichlet structure, with learning and inference phases, and extends the model with generalized Dirichlet and noisy-prediction variants to capture richer dependence and uncertainty. Empirical evaluation across Gaussian and bivariate-beta simulations shows substantial accuracy gains over baselines, with generalized variants offering strongest performance in some settings and practical applicability to telecom churn data. The approach yields interpretable posteriors for both population- and individual-level counterfactuals, enabling detection of counterfactual patterns and untapped intervention opportunities beyond traditional uplift analyses.
Abstract
Uplift modeling estimates the causal effect of an intervention as the difference between potential outcomes under treatment and control, whereas counterfactual identification aims to recover the joint distribution of these potential outcomes (e.g., "Would this customer still have churned had we given them a marketing offer?"). This joint counterfactual distribution provides richer information than the uplift but is harder to estimate. However, the two approaches are synergistic: uplift models can be leveraged for counterfactual estimation. We propose a counterfactual estimator that fits a bivariate beta distribution to predicted uplift scores, yielding posterior distributions over counterfactual outcomes. Our approach requires no causal assumptions beyond those of uplift modeling. Simulations show the efficacy of the approach, which can be applied, for example, to the problem of customer churn in telecom, where it reveals insights unavailable to standard ML or uplift models alone.
