Table of Contents
Fetching ...

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

Dennis Frauen, Konstantin Hess, Stefan Feuerriegel

TL;DR

This work develops model-agnostic meta-learners for estimating time-varying heterogeneous treatment effects from observational data, addressing time-varying confounding, history growth, and overlap challenges. It introduces four adjustment families—history, regression (G-computation), propensity, and doubly robust—with plug-in and two-stage variants, and provides a comprehensive theoretical analysis of their bias and convergence rates. A key contribution is the IVW-DR-learner, which stabilizes the DR loss via inverse-variance weights to mitigate high variance under weak overlap, especially for long horizons. Empirical results on simulated and real-world data demonstrate the superiority of the IVW-DR-learner in low-overlap and long-horizon regimes, while confirming the model-agnostic nature of the approach using transformer-based backbones.

Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners -- so-called meta-learners -- are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.

Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

TL;DR

This work develops model-agnostic meta-learners for estimating time-varying heterogeneous treatment effects from observational data, addressing time-varying confounding, history growth, and overlap challenges. It introduces four adjustment families—history, regression (G-computation), propensity, and doubly robust—with plug-in and two-stage variants, and provides a comprehensive theoretical analysis of their bias and convergence rates. A key contribution is the IVW-DR-learner, which stabilizes the DR loss via inverse-variance weights to mitigate high variance under weak overlap, especially for long horizons. Empirical results on simulated and real-world data demonstrate the superiority of the IVW-DR-learner in low-overlap and long-horizon regimes, while confirming the model-agnostic nature of the approach using transformer-based backbones.

Abstract

Estimating heterogeneous treatment effects (HTEs) over time is crucial in many disciplines such as personalized medicine. For example, electronic health records are commonly collected over several time periods and then used to personalize treatment decisions. Existing works for this task have mostly focused on model-based learners (i.e., learners that adapt specific machine-learning models). In contrast, model-agnostic learners -- so-called meta-learners -- are largely unexplored. In our paper, we propose several meta-learners that are model-agnostic and thus can be used in combination with arbitrary machine learning models (e.g., transformers) to estimate HTEs over time. Here, our focus is on learners that can be obtained via weighted pseudo-outcome regressions, which allows for efficient estimation by targeting the treatment effect directly. We then provide a comprehensive theoretical analysis that characterizes the different learners and that allows us to offer insights into when specific learners are preferable. Finally, we confirm our theoretical insights through numerical experiments. In sum, while meta-learners are already state-of-the-art for the static setting, we are the first to propose a comprehensive set of meta-learners for estimating HTEs in the time-varying setting.
Paper Structure (27 sections, 4 theorems, 54 equations, 4 figures, 7 tables)

This paper contains 27 sections, 4 theorems, 54 equations, 4 figures, 7 tables.

Key Result

Theorem 1

Under Assumptions ass:identifiability--ass:sample_splitting, we obtain the following rates: $\mathrm{bias}_{\bar{a}, \bar{b}} = \mathbb{E}\left[Y_{t + \tau} \mid \bar{H}_{t} = \bar{h}_{t}, \bar{A}_{t: t+\tau} = \bar{a} _{t: t+\tau}\right] - \mathbb{E}\left[Y_{t + \tau} \mid \bar{H}_{t} = \bar{h}_{t}, \bar{A}_{t: t+\tau} = \bar{b} _{t: t+\tau}\right] - \tau_{\bar{a}, \bar{b}}(\bar{h}_{t})$.

Figures (4)

  • Figure 1: Setting for estimating treatment effects over time.
  • Figure 2: Overview of the different nuisance estimators and meta-learners proposed in this paper.
  • Figure 3: RMSE of DR- and IVW-DR-learner for different levels of overlap averaged over $5$ random seeds. Larger $\gamma$ implies lower overlap.
  • Figure 4: Results for real-world medical data. Blood pressure predictions of meta-learners for a single patient that was never treated.

Theorems & Definitions (11)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Lemma 1: G-formula Robins.1999
  • proof
  • proof
  • Lemma 2
  • proof
  • Definition 1
  • ...and 1 more