Table of Contents
Fetching ...

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis

TL;DR

This work addresses the fragility of inverse propensity-score weighted estimators to inaccuracies in propensity scores and the presence of outliers in covariates. It introduces Coarse IPW (CIPW) estimators, defined via data-dependent partitions of the covariate space, and proves that under mild Lipschitz and sparsity conditions, a learning procedure can achieve an RMSE of order $O\left(\varepsilon + 1/\sqrt{n}\right)$ when propensity scores are perturbed by at most $\varepsilon$, and a robust RMSE of similar order in the presence of outliers. The authors establish both algorithmic guarantees and fundamental hardness results: Min-RMSE is NP-hard to optimize or approximate, and learning good-local partitions from finite data is information-theoretically hard without data dependence. They show asymptotic normality for CIPW estimators built on good-local partitions and provide explicit bias and variance bounds that justify the robustness of CIPW. Overall, CIPW offers provably robust, data-dependent confidence intervals for the average treatment effect in observational studies, outperforming traditional IPW and trimmed-IPW approaches and aligning with information-theoretic lower bounds in favorable regimes.

Abstract

Inverse propensity-score weighted (IPW) estimators are prevalent in causal inference for estimating average treatment effects in observational studies. Under unconfoundedness, given accurate propensity scores and $n$ samples, the size of confidence intervals of IPW estimators scales down with $n$, and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an $\varepsilon>0$ additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with $n$ can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given $\varepsilon$-inaccurate propensity scores and $n$ samples, its confidence interval size scales with $\varepsilon+1/\sqrt{n}$. In contrast, under the same assumptions, existing estimators' confidence interval sizes are $Ω(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies.

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

TL;DR

This work addresses the fragility of inverse propensity-score weighted estimators to inaccuracies in propensity scores and the presence of outliers in covariates. It introduces Coarse IPW (CIPW) estimators, defined via data-dependent partitions of the covariate space, and proves that under mild Lipschitz and sparsity conditions, a learning procedure can achieve an RMSE of order when propensity scores are perturbed by at most , and a robust RMSE of similar order in the presence of outliers. The authors establish both algorithmic guarantees and fundamental hardness results: Min-RMSE is NP-hard to optimize or approximate, and learning good-local partitions from finite data is information-theoretically hard without data dependence. They show asymptotic normality for CIPW estimators built on good-local partitions and provide explicit bias and variance bounds that justify the robustness of CIPW. Overall, CIPW offers provably robust, data-dependent confidence intervals for the average treatment effect in observational studies, outperforming traditional IPW and trimmed-IPW approaches and aligning with information-theoretic lower bounds in favorable regimes.

Abstract

Inverse propensity-score weighted (IPW) estimators are prevalent in causal inference for estimating average treatment effects in observational studies. Under unconfoundedness, given accurate propensity scores and samples, the size of confidence intervals of IPW estimators scales down with , and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given -inaccurate propensity scores and samples, its confidence interval size scales with . In contrast, under the same assumptions, existing estimators' confidence interval sizes are irrespective of and . Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies.
Paper Structure (65 sections, 25 theorems, 133 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 65 sections, 25 theorems, 133 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Theorem 1.3

If $\mathsf{P} \neq \mathsf{NP}$, then there is no exponential-factor approximation algorithm for Min-RMSE, i.e., there is no algorithm that, given an instance of Min-RMSE with bit-complexity $b$,The bit complexity of $A$ is the number of bits required to encode $A$ using the standard binary encodin

Figures (1)

  • Figure 1: Illustrations of \ref{['asmp:sparsity', 'asmp:isolation']}. Outlier covariates are in red. Inliers are hidden.

Theorems & Definitions (47)

  • Theorem 1.3: Hardness of Approximation
  • Definition 1: Good-Local Partition
  • Lemma 1.3: Robust RMSE of Good-Local Partition
  • Lemma 1.4: Informal; see \ref{['sec:statisticalHardness']}
  • Lemma 1.4: Impossible to Weakly Beat IPW
  • Definition 2: CIPW Estimators
  • Theorem 4.1: Main Algorithmic Result
  • Theorem 5.1: Bias and Variance of CIPW Estimators
  • proof
  • Remark 5.2: Recovering the Standard IPW Estimator
  • ...and 37 more