Table of Contents
Fetching ...

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

Shachi Deshpande, Volodymyr Kuleshov

TL;DR

This work addresses how miscalibrated propensity-score models can bias causal effect estimation in observational studies. It introduces a simple, post-hoc recalibration framework that learns a recalibrator $R$ to form $R\circ Q$, yielding calibrated treatment probabilities and improved uncertainty quantification. The paper establishes that calibration is a necessary condition for unbiased IPTW and, under reasonable conditions, for accurate AIPW estimates, and it provides error bounds that tighten as calibration improves. Empirically, calibrated propensities reduce estimation bias and variance across drug, image, and GWAS tasks, and can dramatically speed up high-dimensional analyses like GWAS by enabling faster, simpler models while maintaining accuracy.

Abstract

Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated -- i.e., a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group -- and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.

Calibrated and Conformal Propensity Scores for Causal Effect Estimation

TL;DR

This work addresses how miscalibrated propensity-score models can bias causal effect estimation in observational studies. It introduces a simple, post-hoc recalibration framework that learns a recalibrator to form , yielding calibrated treatment probabilities and improved uncertainty quantification. The paper establishes that calibration is a necessary condition for unbiased IPTW and, under reasonable conditions, for accurate AIPW estimates, and it provides error bounds that tighten as calibration improves. Empirically, calibrated propensities reduce estimation bias and variance across drug, image, and GWAS tasks, and can dramatically speed up high-dimensional analyses like GWAS by enabling faster, simpler models while maintaining accuracy.

Abstract

Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated -- i.e., a predictive treatment probability of 90% should correspond to 90% of individuals being assigned the treatment group -- and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.
Paper Structure (56 sections, 17 theorems, 36 equations, 4 figures, 10 tables, 2 algorithms)

This paper contains 56 sections, 17 theorems, 36 equations, 4 figures, 10 tables, 2 algorithms.

Key Result

Theorem 3.1

When $Q(T|X)$ is not calibrated, there exists an outcome function such that an IPTW estimator based on $Q$ yields an incorrect estimate of the true causal effect almost surely.

Figures (4)

  • Figure 1: Recalibrating Propensity Score Model Reduces the Bias in Estimating Treatment Effect from Observational Data.
  • Figure 2: Calibration of propensity score model for Drug Effectiveness Study.
  • Figure : Calibrated Propensity Scoring
  • Figure : Recalibration Step

Theorems & Definitions (31)

  • Theorem 3.1
  • proof : Example
  • Theorem 3.2
  • Lemma 3.3
  • proof : Proof (Sketch)
  • Corollary 3.4
  • Theorem 3.5
  • proof : Proof (Sketch)
  • Theorem 3.6
  • proof : Proof (Sketch)
  • ...and 21 more