A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Filippo Salmaso; Lorenzo Testa; Francesca Chiaromonte

A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Filippo Salmaso, Lorenzo Testa, Francesca Chiaromonte

TL;DR

FOCaL introduces a doubly robust meta-learner to estimate functional heterogeneous treatment effects (F-CATE) from observational data. By extending the DR-Learner framework to functional outcomes, it estimates nuisance functions $\hat{\mu}^{(a)}(x)$ and $\hat{\pi}(x)$, constructs functional pseudo-outcomes $\hat{\gamma}^{(a)}(D)$, and regresses their differences on covariates to obtain $\hat{\theta}(x)$ with valid simultaneous inference via cross-fitting and bootstrap-based bands. Theoretical guarantees include an oracle property and consistent, simultaneous coverage; empirically, FOCaL shows strong robustness to misspecification in simulations and reveals nuanced heterogeneity in real datasets (SHARE and COVID-19 in Italy). This framework enables granular causal understanding for complex, time-evolving outcomes, with implications for personalized medicine and adaptive policymaking. The combination of functional data analysis and causal meta-learning advances allows precise estimation of how interventions affect entire outcome trajectories across subpopulations. $\theta^*(x)$ captures the time-varying, covariate-dependent treatment effect, enriching decision support beyond scalar effects.

Abstract

Causal inference is paramount for understanding the effects of interventions, yet extracting personalized insights from increasingly complex data remains a significant challenge for modern machine learning. This is the case, in particular, when considering functional outcomes observed over a continuous domain (e.g., time, or space). Estimation of heterogeneous treatment effects, known as CATE, has emerged as a crucial tool for personalized decision-making, but existing meta-learning frameworks are largely limited to scalar outcomes, failing to provide satisfying results in scientific applications that leverage the rich, continuous information encoded in functional data. Here, we introduce FOCaL (Functional Outcome Causal Learning), a novel, doubly robust meta-learner specifically engineered to estimate a functional heterogeneous treatment effect (F-CATE). FOCaL integrates advanced functional regression techniques for both outcome modeling and functional pseudo-outcome reconstruction, thereby enabling the direct and robust estimation of F-CATE. We provide a rigorous theoretical derivation of FOCaL, demonstrate its performance and robustness compared to existing non-robust functional methods through comprehensive simulation studies, and illustrate its practical utility on diverse real-world functional datasets. FOCaL advances the capabilities of machine intelligence to infer nuanced, individualized causal effects from complex data, paving the way for more precise and trustworthy AI systems in personalized medicine, adaptive policy design, and fundamental scientific discovery.

A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

TL;DR

and

, constructs functional pseudo-outcomes

, and regresses their differences on covariates to obtain

with valid simultaneous inference via cross-fitting and bootstrap-based bands. Theoretical guarantees include an oracle property and consistent, simultaneous coverage; empirically, FOCaL shows strong robustness to misspecification in simulations and reveals nuanced heterogeneity in real datasets (SHARE and COVID-19 in Italy). This framework enables granular causal understanding for complex, time-evolving outcomes, with implications for personalized medicine and adaptive policymaking. The combination of functional data analysis and causal meta-learning advances allows precise estimation of how interventions affect entire outcome trajectories across subpopulations.

captures the time-varying, covariate-dependent treatment effect, enriching decision support beyond scalar effects.

Abstract

Paper Structure (23 sections, 4 theorems, 27 equations, 14 figures, 2 tables)

This paper contains 23 sections, 4 theorems, 27 equations, 14 figures, 2 tables.

Introduction
Results
Overview of FOCaL
Simulation results provide strong evidence of FOCaL's effectiveness
FOCaL reveals the heterogeneous effects of chronic conditions on quality of life
FOCaL sheds light on the heterogeneous role of distributed primary health care in shaping COVID-19 mortality patterns
Discussion
Methods
Notation and target identification
FOCaL in detail
Theoretical guarantees and inference
Simulation study
SHARE study
COVID-19 study
Technical Lemmas
...and 8 more sections

Key Result

Lemma 4.2

Under Assumption ass:identify (identifiability), the F-CATE target defined in Eq. eq:FCATE can be rewritten as:

Figures (14)

Figure 1: Performance and robustness of FOCaL in simulated settings. (a) Comparison of functional outcome trajectories for treated (left) and untreated (right) populations. Solid lines represent the naive observed means, which exhibit significant bias due to confounding (departures from the Oracle, which is shown in solid gray). Lines with symbols (circle, triangle, squares, diamonds) represent the model-recovered trajectories under different specifications; note that FOCaL accurately reconstructs the true curves even when one nuisance component is misspecified, correcting the bias inherent in the raw data. (b) Density plots for original covariates ($X_j$, $j=0,1,2,3$) illustrating the distributional imbalance between treated (red) and control (blue) groups, which motivates the adjustment (the distributions differ especially for $X_0$). (c) Evaluation of FATE estimation accuracy. The line plot displays the estimated Functional Average Treatment Effect (FATE) over time; estimates remain stable and close to the truth (Oracle, in solid grey) unless both models are misspecified. (d) Example of F-CATE target. The surface plot visualizes the estimated Functional Conditional Average Treatment Effect (F-CATE) as a function of time and covariate $X_0$, demonstrating FOCaL’s ability to capture complex, heterogeneous effect surfaces (the true surface is shown for reference in Supplementary Figure \ref{['fig:true_surface']}). (e) Evaluation of F-CATE estimation accuracy. The boxplots show Aggregated Root Mean Squared Error (ARMSE) values across 1000 Monte Carlo replications, confirming that sizeable error inflation occurs only under double misspecification.
Figure 2: Heterogeneous impact of hypertension on quality of life trajectories (SHARE study). (a) Distributions of key baseline covariates for treated (hypertension) and control populations. While most socio-demographic factors show comparable spread, age exhibits a sizeable imbalance between groups; this motivates the propensity adjustment performed by FOCaL. (b) Results for CASP. The estimated Functional Average Treatment Effect (FATE) and the associated simultaneous 95% confidence band over the 192-month observation window indicate a statistically significant decline in well-being: hypertension exerts a negative and progressively larger effect on the CASP quality of life score. The estimated F-CATE surfaces characterizing the heterogeneity of the causal effect conditioned on age and gender (all other binary and scalar covariates are set to their modal and mean values, respectively) show that the detrimental effect of hypertension intensifies with age across the entire time domain. Gender plays an intercept-like role, amplifying the absolute magnitude of the functional effect for males without markedly altering the shape of the surface. (c) Results for Mobility Index. The estimated Functional Average Treatment Effect (FATE) and the associated simultenous 95% confidence band show a positive, increasing, and statistically significant effect on the Mobility Index (higher values indicate greater impairment). The estimated F-CATE surfaces reveal a distinct temporal profile for older individuals -- a moderate mid-period exacerbation followed by a sharp, late-stage increase during the final stages of the time domain. This profile is more pronounced and occurs earlier for women than for men. Note: the plots for CASP and the Mobility Index have response-specific vertical scales (the two outcomes are measured on different scales; see Supplementary Figure \ref{['fig:outhyp']}).
Figure 3: Heterogeneous impact of availability of primary health care on COVID-19 mortality patterns (COVID-19 study). (a) Distribution of key socio-demographic and environmental baseline covariates for provinces with high ("Treated") versus low ("Untreated") number of adults per primary care physician (treated provinces are those with poorer distributed primary health care). Sizeable imbalances between groups are observed in most of the covariates, motivating the propensity adjustment performed by FOCaL. (b) Wave 1 results. Top: The estimated Functional Average Treatment Effect (FATE) for the first wave (February-July 2020) suggests that poor distributed primary health care exacerbated mortality in the initial stages, but this is non-significant based on the 95% confidence band. Bottom left: The estimated F-CATE surface conditioned on the "area before" standardized metric disaggregates this effect, revealing a dramatic escalation in provinces with the highest early outbreak intensity, and thus pinpointing how poor distributed primary health care was most consequential where the epidemic had the strongest early momentum. Bottom right: Cuts of the surface plot at the 10th (blue) and 90th (red) percentiles of the marginal distribution of "area before", with 90% confidence bands supporting strong significance at the latter. (c) Wave 2 results. Top: The estimated FATE for the second wave (October 2020–February 2021) appears rather flat with a tight 95% confidence band around $0$, suggesting a negligible impact of distributed primary health care during the more widespread and asynchronous epidemic unfolding that characterized this wave compared to the first. Bottom left: The estimated F-CATE surface confirms a negligible impact also when conditioning on "area before". Bottom right: Cuts of the surface plot at the 10th (blue) and 90th (red) percentiles of the marginal distribution of "area before", again with 90% bands.
Figure C.1: Ground truth F-CATE surface for simulation study. This surface represents the true Functional Conditional Average Treatment Effect (F-CATE) used as the target for one run of the simulation experiments. It is visualized as a function of the functional domain (Time) and the primary covariate of interest ($X_0$), illustrating the complex, non-linear heterogeneity the FOCaL framework is designed to recover.
Figure D.2: Distribution of treatment status for the SHARE hypertension study. Bar chart showing the proportion of subjects who presented with hypertension at the beginning of the study (Treated, $n=419$) compared to those who remained healthy throughout the observation window (Untreated, $n=577$).
...and 9 more figures

Theorems & Definitions (9)

Lemma 4.2
Definition 4.3: Stability
Theorem 4.4: Oracle property
Proposition 4.5: Valid and simultaneous coverage
Lemma A.1
proof
proof
proof
proof

A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

TL;DR

Abstract

A Doubly Robust Machine Learning Approach for Disentangling Treatment Effect Heterogeneity with Functional Outcomes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (9)