Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference
Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone
TL;DR
The paper targets reliable, finite-sample, distribution-free prediction intervals for counterfactual outcomes under treatment imbalance. It introduces SP-CCI, which augments the calibration set with synthetic counterfactual labels generated from a pre-trained model and uses a debiased miscoverage estimator based on Prediction-Powered Inference and Risk-Controlling Prediction Sets to preserve marginal coverage. The approach yields consistently tighter prediction intervals than standard CCI, with theoretical guarantees under exact and approximate importance weighting. Empirical results on synthetic data and the IHDP benchmark demonstrate substantial efficiency gains, highlighting practical value for high-stakes decision-making with limited counterfactual data.
Abstract
This work addresses the problem of constructing reliable prediction intervals for individual counterfactual outcomes. Existing conformal counterfactual inference (CCI) methods provide marginal coverage guarantees but often produce overly conservative intervals, particularly under treatment imbalance when counterfactual samples are scarce. We introduce synthetic data-powered CCI (SP-CCI), a new framework that augments the calibration set with synthetic counterfactual labels generated by a pre-trained counterfactual model. To ensure validity, SP-CCI incorporates synthetic samples into a conformal calibration procedure based on risk-controlling prediction sets (RCPS) with a debiasing step informed by prediction-powered inference (PPI). We prove that SP-CCI achieves tighter prediction intervals while preserving marginal coverage, with theoretical guarantees under both exact and approximate importance weighting. Empirical results on different datasets confirm that SP-CCI consistently reduces interval width compared to standard CCI across all settings.
