Calibrated and Conformal Propensity Scores for Causal Effect Estimation

Shachi Deshpande; Volodymyr Kuleshov

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

Shachi Deshpande, Volodymyr Kuleshov

TL;DR

This work addresses how miscalibrated propensity-score models can bias causal effect estimation in observational studies. It introduces a simple, post-hoc recalibration framework that learns a recalibrator $R$ to form $R\circ Q$, yielding calibrated treatment probabilities and improved uncertainty quantification. The paper establishes that calibration is a necessary condition for unbiased IPTW and, under reasonable conditions, for accurate AIPW estimates, and it provides error bounds that tighten as calibration improves. Empirically, calibrated propensities reduce estimation bias and variance across drug, image, and GWAS tasks, and can dramatically speed up high-dimensional analyses like GWAS by enabling faster, simpler models while maintaining accuracy.

Abstract

Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated -- i.e., a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group -- and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

TL;DR

to form

, yielding calibrated treatment probabilities and improved uncertainty quantification. The paper establishes that calibration is a necessary condition for unbiased IPTW and, under reasonable conditions, for accurate AIPW estimates, and it provides error bounds that tighten as calibration improves. Empirically, calibrated propensities reduce estimation bias and variance across drug, image, and GWAS tasks, and can dramatically speed up high-dimensional analyses like GWAS by enabling faster, simpler models while maintaining accuracy.

Abstract

Paper Structure (56 sections, 17 theorems, 36 equations, 4 figures, 10 tables, 2 algorithms)

This paper contains 56 sections, 17 theorems, 36 equations, 4 figures, 10 tables, 2 algorithms.

INTRODUCTION
BACKGROUND
Notation
Causal Effect Estimation Using Propensity Scoring
Calibrated and Conformal Prediction
Calibration.
Calibrated and Conformal Prediction.
CALIBRATED PROPENSITY SCORES
Calibration: A Necessary Condition for Propensity Scoring Model
Calibrated Uncertainties Improve Propensity Scoring Models
Error Bound on Causal Effect Estimates
Calibration Reduces Variance of Estimators
Calibration Improves Error Bounds
Calibration and Accurate Causal Effect Estimation
ALGORITHMS FOR CALIBRATED PROPENSITY SCORING
...and 41 more sections

Key Result

Theorem 3.1

When $Q(T|X)$ is not calibrated, there exists an outcome function such that an IPTW estimator based on $Q$ yields an incorrect estimate of the true causal effect almost surely.

Figures (4)

Figure 1: Recalibrating Propensity Score Model Reduces the Bias in Estimating Treatment Effect from Observational Data.
Figure 2: Calibration of propensity score model for Drug Effectiveness Study.
Figure : Calibrated Propensity Scoring
Figure : Recalibration Step

Theorems & Definitions (31)

Theorem 3.1
proof : Example
Theorem 3.2
Lemma 3.3
proof : Proof (Sketch)
Corollary 3.4
Theorem 3.5
proof : Proof (Sketch)
Theorem 3.6
proof : Proof (Sketch)
...and 21 more

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

TL;DR

Abstract

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (31)