Table of Contents
Fetching ...

Penalized Empirical Likelihood for Doubly Robust Causal Inference under Contamination in High Dimensions

Byeonghee Lee, Sangwook Kang, Ju-Hyun Park, Saebom Jeon, Joonsung Kang

Abstract

We propose a doubly robust estimator for the average treatment effect in high dimensional low sample size observational studies, where contamination and model misspecification pose serious inferential challenges. The estimator combines bounded influence estimating equations for outcome modeling with covariate balancing propensity scores for treatment assignment, embedded within a penalized empirical likelihood framework using nonconvex regularization. It satisfies the oracle property by jointly achieving consistency under partial model correct ness, selection consistency, robustness to contamination, and asymptotic normality. For uncertainty quantification, we derive a finite sample confidence interval using cumulant generating functions and influence function corrections, avoiding reliance on asymptotic approximations. Simulation studies and applications to gene expression datasets (Golub and Khan) demonstrate superior performance in bias, error metrics, and interval calibration, highlighting the method robustness and inferential validity in HDLSS regimes. One notable aspect is that even in the absence of contamination, the proposed estimator and its confidence interval remain efficient compared to those of competing models.

Penalized Empirical Likelihood for Doubly Robust Causal Inference under Contamination in High Dimensions

Abstract

We propose a doubly robust estimator for the average treatment effect in high dimensional low sample size observational studies, where contamination and model misspecification pose serious inferential challenges. The estimator combines bounded influence estimating equations for outcome modeling with covariate balancing propensity scores for treatment assignment, embedded within a penalized empirical likelihood framework using nonconvex regularization. It satisfies the oracle property by jointly achieving consistency under partial model correct ness, selection consistency, robustness to contamination, and asymptotic normality. For uncertainty quantification, we derive a finite sample confidence interval using cumulant generating functions and influence function corrections, avoiding reliance on asymptotic approximations. Simulation studies and applications to gene expression datasets (Golub and Khan) demonstrate superior performance in bias, error metrics, and interval calibration, highlighting the method robustness and inferential validity in HDLSS regimes. One notable aspect is that even in the absence of contamination, the proposed estimator and its confidence interval remain efficient compared to those of competing models.

Paper Structure

This paper contains 19 sections, 4 theorems, 49 equations, 6 figures, 4 tables.

Key Result

Theorem 3.1

Let $\boldsymbol{W}_i = (T_i, Y_i, \boldsymbol{X}_i)$ be i.i.d. random vectors with density $f(\boldsymbol{W}_i; \boldsymbol{\eta})$. Suppose regularity conditions A.8–A.10 are relaxed. If then there exists a local minimizer $\hat{\boldsymbol{\eta}}$ of the objective function $\boldsymbol{Q}_n(\boldsymbol{\eta})$ such that where $\alpha_n = n^{-1/2} + a_n$ and $a_n$ is the maximum derivative of

Figures (6)

  • Figure 1: Performance metrics by estimator under no contamination ($\gamma = 0.0$).
  • Figure 2: Performance metrics by estimator under mild contamination ($\gamma = 0.1$).
  • Figure 3: Performance metrics by estimator under heavy contamination ($\gamma = 0.2$).
  • Figure 4: Coverage rate across contamination levels ($\gamma = 0.0, 0.1, 0.2$).
  • Figure 5: Average interval width across contamination levels.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 3.1: Local Consistency of Penalized Empirical Likelihood Estimator
  • proof
  • Theorem 3.2: Sparsity Consistency
  • Theorem 3.3: Asymptotic Normality of the Proposed ATE Estimator
  • proof
  • Theorem 3.4: Robust Consistency and Outlier Resistance of the Proposed ATE Estimator
  • proof