Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

Alkis Kalavasis; Anay Mehrotra; Manolis Zampetakis

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

Alkis Kalavasis, Anay Mehrotra, Manolis Zampetakis

TL;DR

This work addresses the fragility of inverse propensity-score weighted estimators to inaccuracies in propensity scores and the presence of outliers in covariates. It introduces Coarse IPW (CIPW) estimators, defined via data-dependent partitions of the covariate space, and proves that under mild Lipschitz and sparsity conditions, a learning procedure can achieve an RMSE of order $O\left(\varepsilon + 1/\sqrt{n}\right)$ when propensity scores are perturbed by at most $\varepsilon$, and a robust RMSE of similar order in the presence of outliers. The authors establish both algorithmic guarantees and fundamental hardness results: Min-RMSE is NP-hard to optimize or approximate, and learning good-local partitions from finite data is information-theoretically hard without data dependence. They show asymptotic normality for CIPW estimators built on good-local partitions and provide explicit bias and variance bounds that justify the robustness of CIPW. Overall, CIPW offers provably robust, data-dependent confidence intervals for the average treatment effect in observational studies, outperforming traditional IPW and trimmed-IPW approaches and aligning with information-theoretic lower bounds in favorable regimes.

Abstract

Inverse propensity-score weighted (IPW) estimators are prevalent in causal inference for estimating average treatment effects in observational studies. Under unconfoundedness, given accurate propensity scores and $n$ samples, the size of confidence intervals of IPW estimators scales down with $n$, and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an $\varepsilon>0$ additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with $n$ can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given $\varepsilon$-inaccurate propensity scores and $n$ samples, its confidence interval size scales with $\varepsilon+1/\sqrt{n}$. In contrast, under the same assumptions, existing estimators' confidence interval sizes are $Ω(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies.

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

TL;DR

when propensity scores are perturbed by at most

, and a robust RMSE of similar order in the presence of outliers. The authors establish both algorithmic guarantees and fundamental hardness results: Min-RMSE is NP-hard to optimize or approximate, and learning good-local partitions from finite data is information-theoretically hard without data dependence. They show asymptotic normality for CIPW estimators built on good-local partitions and provide explicit bias and variance bounds that justify the robustness of CIPW. Overall, CIPW offers provably robust, data-dependent confidence intervals for the average treatment effect in observational studies, outperforming traditional IPW and trimmed-IPW approaches and aligning with information-theoretic lower bounds in favorable regimes.

Abstract

samples, the size of confidence intervals of IPW estimators scales down with

, and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an

additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with

can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given

-inaccurate propensity scores and

samples, its confidence interval size scales with

. In contrast, under the same assumptions, existing estimators' confidence interval sizes are

irrespective of

and

. Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies.

Paper Structure (65 sections, 25 theorems, 133 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 65 sections, 25 theorems, 133 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Issue I: Inaccuracies.
Issue II: Outliers.
Our Main Result: Estimation Robust to Inaccuracies and Outliers
Other Contributions
Properties of CIPW Estimators
CIPW Estimators.
Robust Root Mean Squared Error.
Computational Complexity.
A Criterion Guaranteeing Small Robust RMSE.
Learning a Good-Local Partition.
Need for Data Dependence
Related Work
Preliminaries
Causal Inference Setup.
...and 50 more sections

Key Result

Theorem 1.3

If $\mathsf{P} \neq \mathsf{NP}$, then there is no exponential-factor approximation algorithm for Min-RMSE, i.e., there is no algorithm that, given an instance of Min-RMSE with bit-complexity $b$,The bit complexity of $A$ is the number of bits required to encode $A$ using the standard binary encodin

Figures (1)

Figure 1: Illustrations of \ref{['asmp:sparsity', 'asmp:isolation']}. Outlier covariates are in red. Inliers are hidden.

Theorems & Definitions (47)

Theorem 1.3: Hardness of Approximation
Definition 1: Good-Local Partition
Lemma 1.3: Robust RMSE of Good-Local Partition
Lemma 1.4: Informal; see \ref{['sec:statisticalHardness']}
Lemma 1.4: Impossible to Weakly Beat IPW
Definition 2: CIPW Estimators
Theorem 4.1: Main Algorithmic Result
Theorem 5.1: Bias and Variance of CIPW Estimators
proof
Remark 5.2: Recovering the Standard IPW Estimator
...and 37 more

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

TL;DR

Abstract

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (47)