Nonparametric Heterogeneous Long-term Causal Effect Estimation via Data Combination
Weilin Chen, Ruichu Cai, Junjie Wan, Zeqin Yang, José Miguel Hernández-Lobato
TL;DR
This paper tackles the challenging problem of estimating heterogeneous long-term causal effects from fused short-term experimental and long-term observational data under the Conditional Additive Equi-Confounding Bias assumption. It introduces two-stage regression- and propensity-based HLCE estimators, and a novel multiple robust estimator that remains consistent when any one of several nuisance-function sets is correctly specified, with theoretical convergence guarantees and oracle-rate comparisons. A neural-network-based MR estimator with shared representations is developed to enhance practical performance, and extensive experiments on synthetic, semi-synthetic IHDP, and News datasets demonstrate improved HLCE accuracy and stability, especially in small-sample regimes. The work offers principled tools for personalized long-term decision-making and advances understanding of robustness and efficiency in HLCE estimation.
Abstract
Long-term causal inference has drawn increasing attention in many scientific domains. Existing methods mainly focus on estimating average long-term causal effects by combining long-term observational data and short-term experimental data. However, it is still understudied how to robustly and effectively estimate heterogeneous long-term causal effects, significantly limiting practical applications. In this paper, we propose several two-stage style nonparametric estimators for heterogeneous long-term causal effect estimation, including propensity-based, regression-based, and multiple robust estimators. We conduct a comprehensive theoretical analysis of their asymptotic properties under mild assumptions, with the ultimate goal of building a better understanding of the conditions under which some estimators can be expected to perform better. Extensive experiments across several semi-synthetic and real-world datasets validate the theoretical results and demonstrate the effectiveness of the proposed estimators.
