Causal inference through multi-stage learning and doubly robust deep neural networks
Yuqian Zhang, Jelena Bradic
TL;DR
The paper tackles causal inference in high dimensions where the covariate dimension $p$ grows with the sample size $n$. It develops a multi-stage learning framework using doubly robust deep neural networks (DNNs) with generalized Neyman orthogonality to mitigate error propagation across stages, enabling valid inference even under sparse structural assumptions on $P_{Y|\mathbf X}$. The authors establish convergence rates for nested DNNs and by extension for multilayer perceptrons, derive theoretical guarantees for CATE and dynamic treatment effect (DTE) estimation in high-dimensional settings, and demonstrate cross-fitting strategies to attain valid asymptotic results. The approach yields robust, scalable nonparametric causal estimators that adapt to high dimensionality and complex regression structures, with practical guidance on architecture choices and nuisance-estimation strategies. Overall, the work advances theory for DNN-based multi-stage causal inference under growing $p$, and provides a foundation for applying these methods to heterogeneous treatment effects and sequential decision problems in large-scale observational studies.
Abstract
Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size $n$ and the dimension of covariates $p$ are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality $p$ expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.
