Predictive Causal Inference via Spatio-Temporal Modeling and Penalized Empirical Likelihood
Byunghee Lee, Hye Yeon Sin, Joonsung Kang
TL;DR
The paper tackles predictive causal inference in complex spatiotemporal biomedical and environmental data where HDLSS conditions complicate estimation. It proposes an integrated approach that merges temporal latent dynamics via a $Hidden\ Markov\ Model$ and spatial heterogeneity via a $Multi-Task\ Graph\ Convolutional\ Network$, with time and space treated as endogenous in the outcome model and exogenous in propensity scoring. The methodology builds a robust pipeline using $CBPS$, penalized empirical likelihood with $SCAD$ regularization, and a doubly robust $ATE$ estimator to achieve bias reduction and improved predictive accuracy. In simulations and three real HDLSS datasets, the framework consistently achieves lower bias, MSE, and MAE than state-of-the-art baselines, demonstrating practical impact for precision medicine and environmental policy.
Abstract
This study introduces an integrated framework for predictive causal inference designed to overcome limitations inherent in conventional single model approaches. Specifically, we combine a Hidden Markov Model (HMM) for spatial health state estimation with a Multi Task and Multi Graph Convolutional Network (MTGCN) for capturing temporal outcome trajectories. The framework asymmetrically treats temporal and spatial information regarding them as endogenous variables in the outcome regression, and exogenous variables in the propensity score model, thereby expanding the standard doubly robust treatment effect estimation to jointly enhance bias correction and predictive accuracy. To demonstrate its utility, we focus on clinical domains such as cancer, dementia, and Parkinson disease, where treatment effects are challenging to observe directly. Simulation studies are conducted to emulate latent disease dynamics and evaluate the model performance under varying conditions. Overall, the proposed framework advances predictive causal inference by structurally adapting to spatiotemporal complexities common in biomedical data.
