Table of Contents
Fetching ...

Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects

Haorui Ma, Dennis Frauen, Valentyn Melnychuk, Stefan Feuerriegel

Abstract

Estimation of heterogeneous long-term treatment effects (HLTEs) is widely used for personalized decision-making in marketing, economics, and medicine, where short-term randomized experiments are often combined with long-term observational data. However, HLTE estimation is challenging due to limited overlap in treatment or in observing long-term outcomes for certain subpopulations, which can lead to unstable HLTE estimates with large finite-sample variance. To address this challenge, we introduce the LT-O-learners (Long-Term Orthogonal Learners), a set of novel orthogonal learners for HLTE estimation. The learners are designed for the canonical HLTE setting that combines a short-term randomized dataset $\mathcal{D}_1$ with a long-term historical dataset $\mathcal{D}_2$. The key idea of our LT-O-Learners is to retarget the learning objective by introducing custom overlap weights that downweight samples with low overlap in treatment or in long-term observation. We show that the retargeted loss is equivalent to the weighted oracle loss and satisfies Neyman-orthogonality, which means our learners are robust to errors in the nuisance estimation. We further provide a general error bound for the LT-O-Learners and give the conditions under which quasi-oracle rate can be achieved. Finally, our LT-O-learners are model-agnostic and can thus be instantiated with arbitrary machine learning models. We conduct empirical evaluations on synthetic and semi-synthetic benchmarks to confirm the theoretical properties of our LT-O-Learners, especially the robustness in low-overlap settings. To the best of our knowledge, ours are the first orthogonal learners for HLTE estimation that are robust to low overlap that is common in long-term outcomes.

Orthogonal Learner for Estimating Heterogeneous Long-Term Treatment Effects

Abstract

Estimation of heterogeneous long-term treatment effects (HLTEs) is widely used for personalized decision-making in marketing, economics, and medicine, where short-term randomized experiments are often combined with long-term observational data. However, HLTE estimation is challenging due to limited overlap in treatment or in observing long-term outcomes for certain subpopulations, which can lead to unstable HLTE estimates with large finite-sample variance. To address this challenge, we introduce the LT-O-learners (Long-Term Orthogonal Learners), a set of novel orthogonal learners for HLTE estimation. The learners are designed for the canonical HLTE setting that combines a short-term randomized dataset with a long-term historical dataset . The key idea of our LT-O-Learners is to retarget the learning objective by introducing custom overlap weights that downweight samples with low overlap in treatment or in long-term observation. We show that the retargeted loss is equivalent to the weighted oracle loss and satisfies Neyman-orthogonality, which means our learners are robust to errors in the nuisance estimation. We further provide a general error bound for the LT-O-Learners and give the conditions under which quasi-oracle rate can be achieved. Finally, our LT-O-learners are model-agnostic and can thus be instantiated with arbitrary machine learning models. We conduct empirical evaluations on synthetic and semi-synthetic benchmarks to confirm the theoretical properties of our LT-O-Learners, especially the robustness in low-overlap settings. To the best of our knowledge, ours are the first orthogonal learners for HLTE estimation that are robust to low overlap that is common in long-term outcomes.

Paper Structure

This paper contains 33 sections, 7 theorems, 146 equations, 3 figures, 3 tables.

Key Result

Theorem 5.1

The proposed loss $\mathcal{L}_\omega(g, \eta)$ is orthogonal w.r.t. the nuisance $\eta$. $\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure 1: Our HLTE setting: The yellow and green box illustrate our two-sample setting. $\mathcal{D}_1$ and $\mathcal{D}_2$ share covariates $X$ and surrogates $S$. The surrogate index (i.e. $\mathbb{E}_{\mathcal{D}_2}[Y|S,X]$), learned in the long-term dataset, is used as a proxy for long-term outcomes in the short-term experiment. The left plot shows the two overlap issues.
  • Figure 2: Variance of estimator across different overlap scenarios ($\gamma$): Low overlap leads to a high finite-sample variance of the DR-learner, while our LT-O-learners are robust and maintain a low variance.
  • Figure 3: Benefit of Neyman-orthogonality: (Left) PEHE vs. sample size. The orthogonal LT-O-DO-learner outperforms the weighted (but non-orthogonal) DR/RA-learner. (Right) Mean squared error (MSE) of the nuisance estimation. Taken together, high errors in the nuisance functions at small sample sizes drive the instability of non-orthogonal baselines.

Theorems & Definitions (11)

  • Theorem 5.1: Neyman-orthogonality
  • proof
  • Theorem 5.2: Oracle equivalence
  • proof
  • Theorem 5.3: Error bounds
  • proof
  • Corollary 5.4: Quasi-oracle rate condition
  • Lemma A.1: Identification of HLTE
  • proof
  • Lemma A.2: Neyman-orthogonality
  • ...and 1 more