Table of Contents
Fetching ...

Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

TL;DR

Overlap-Adaptive Regularization (OAR) is introduced that regularizes target models proportionally to overlap weights so that, informally, the regularization is higher in regions with low overlap.

Abstract

The conditional average treatment effect (CATE) is widely used in personalized medicine to inform therapeutic decisions. However, state-of-the-art methods for CATE estimation (so-called meta-learners) often perform poorly in the presence of low overlap. In this work, we introduce a new approach to tackle this issue and improve the performance of existing meta-learners in the low-overlap regions. Specifically, we introduce Overlap-Adaptive Regularization (OAR) that regularizes target models proportionally to overlap weights so that, informally, the regularization is higher in regions with low overlap. To the best of our knowledge, our OAR is the first approach to leverage overlap weights in the regularization terms of the meta-learners. Our OAR approach is flexible and works with any existing CATE meta-learner: we demonstrate how OAR can be applied to both parametric and non-parametric second-stage models. Furthermore, we propose debiased versions of our OAR that preserve the Neyman-orthogonality of existing meta-learners and thus ensure more robust inference. Through a series of (semi-)synthetic experiments, we demonstrate that our OAR significantly improves CATE estimation in low-overlap settings in comparison to constant regularization.

Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation

TL;DR

Overlap-Adaptive Regularization (OAR) is introduced that regularizes target models proportionally to overlap weights so that, informally, the regularization is higher in regions with low overlap.

Abstract

The conditional average treatment effect (CATE) is widely used in personalized medicine to inform therapeutic decisions. However, state-of-the-art methods for CATE estimation (so-called meta-learners) often perform poorly in the presence of low overlap. In this work, we introduce a new approach to tackle this issue and improve the performance of existing meta-learners in the low-overlap regions. Specifically, we introduce Overlap-Adaptive Regularization (OAR) that regularizes target models proportionally to overlap weights so that, informally, the regularization is higher in regions with low overlap. To the best of our knowledge, our OAR is the first approach to leverage overlap weights in the regularization terms of the meta-learners. Our OAR approach is flexible and works with any existing CATE meta-learner: we demonstrate how OAR can be applied to both parametric and non-parametric second-stage models. Furthermore, we propose debiased versions of our OAR that preserve the Neyman-orthogonality of existing meta-learners and thus ensure more robust inference. Through a series of (semi-)synthetic experiments, we demonstrate that our OAR significantly improves CATE estimation in low-overlap settings in comparison to constant regularization.

Paper Structure

This paper contains 36 sections, 19 theorems, 79 equations, 4 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

The average amount of overlap-adaptive regularization $\mathbb{E}[\lambda(\nu(X))]$ is equal to or upper-bounded by $f$-divergences between $\mathbb{P}(X)$ and $\mathbb{P}(X \mid A = a)$ for $a \in \{0, 1\}$.

Figures (4)

  • Figure 1: Motivational example showing how our OAR (in red) performs better in low-overlap regions (in yellow). Here, we used our OAR together with a DR-learner. We adapted the synthetic data generator from melnychuk2023normalizing ($n_{\text{train}} = 250$; see Appendix \ref{['app:dataset']}) and used kernel ridge regression (KRR) as a target model. We see that a target model fitted w/ our OAR($\tilde{\lambda}_\mathrm{m}$) (shown in red) has a better performance in the low-overlap regions, compared to a target model w/ constant regularization (CR, shown in blue).
  • Figure 2: Results for IHDP dataset experiments. Reported: median rPEHE$_{\text{out}}$$\pm$ se over 100 runs.
  • Figure 3: An overview of our OAR for a neural network as a (a) parametric target model $g$. Our OAR /debiased OAR are used at the second stage of the meta-learner to regularize the target network proportionally to the level of overlap (lower overlap leads to stronger regularization). Here, we instantiate OAR with noise injection for the middle layer of $g$: (i) OAR noise regularization and (ii) OAR dropout.
  • Figure 4: Results for synthetic experiments. Reported: rPEHE$_{\text{out}}$; mean $\pm$ se over 40 runs. Lower is better.

Theorems & Definitions (33)

  • Definition 1: Overlap-adaptive regularization (explicit form)
  • Proposition 1: Average regularization function as a distributional distance
  • Proposition 2: Explicit form of OAR noise regularization in linear $g$
  • Proposition 3: Explicit form of OAR dropout in linear $g$
  • Proposition 4: Debiased OAR
  • proof
  • Proposition 5: Excess prediction risk of our OAR/dOAR dropout with linear second-stage model
  • proof
  • Proposition 6: Kernel ridge regression with an OAR-based RKHS norm
  • Definition 2: Neyman-orthogonality chernozhukov2017doublefoster2023orthogonalmorzywolek2023general
  • ...and 23 more