Counterfactual Forecasting For Panel Data
Navonil Deb, Raaz Dwivedi, Sumanta Basu
TL;DR
FOCUS tackles counterfactual forecasting in panel data with missing entries by embedding stochastic dynamics into a low-rank factor model and forecasting through a VAR(1) on the latent factors. It combines PCA-based factor estimation with time-series forecasting to produce out-of-sample counterfactual means, providing nonasymptotic error bounds and asymptotic normality, along with valid confidence intervals. Empirical results on simulations and the HeartSteps mobile-health study show that leveraging autoregressive latent dynamics yields more accurate counterfactual forecasts than benchmark methods, enabling prospective decision-making under interventions. The approach offers a principled, scalable framework for forecasting counterfactuals in settings with missing data and temporally dependent latent structure, with potential extensions to nonstationary dynamics and doubly robust estimators.
Abstract
We address the challenge of forecasting counterfactual outcomes in a panel data with missing entries and temporally dependent latent factors -- a common scenario in causal inference, where estimating unobserved potential outcomes ahead of time is essential. We propose Forecasting Counterfactuals under Stochastic Dynamics (FOCUS), a method that extends traditional matrix completion methods by leveraging time series dynamics of the factors, thereby enhancing the prediction accuracy of future counterfactuals. Building upon a PCA estimator, our method accommodates both stochastic and deterministic components within the factors, and provides a flexible framework for various applications. In case of stationary autoregressive factors and under standard conditions, we derive error bounds and establish asymptotic normality of our estimator. Empirical evaluations demonstrate that our method outperforms existing benchmarks when the latent factors have an autoregressive component. We illustrate FOCUS results on HeartSteps, a mobile health study, illustrating its effectiveness in forecasting step counts for users receiving activity prompts, thereby leveraging temporal patterns in user behavior.
