Causal inference through multi-stage learning and doubly robust deep neural networks

Yuqian Zhang; Jelena Bradic

Causal inference through multi-stage learning and doubly robust deep neural networks

Yuqian Zhang, Jelena Bradic

TL;DR

The paper tackles causal inference in high dimensions where the covariate dimension $p$ grows with the sample size $n$. It develops a multi-stage learning framework using doubly robust deep neural networks (DNNs) with generalized Neyman orthogonality to mitigate error propagation across stages, enabling valid inference even under sparse structural assumptions on $P_{Y|\mathbf X}$. The authors establish convergence rates for nested DNNs and by extension for multilayer perceptrons, derive theoretical guarantees for CATE and dynamic treatment effect (DTE) estimation in high-dimensional settings, and demonstrate cross-fitting strategies to attain valid asymptotic results. The approach yields robust, scalable nonparametric causal estimators that adapt to high dimensionality and complex regression structures, with practical guidance on architecture choices and nuisance-estimation strategies. Overall, the work advances theory for DNN-based multi-stage causal inference under growing $p$, and provides a foundation for applying these methods to heterogeneous treatment effects and sequential decision problems in large-scale observational studies.

Abstract

Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size $n$ and the dimension of covariates $p$ are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality $p$ expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.

Causal inference through multi-stage learning and doubly robust deep neural networks

TL;DR

The paper tackles causal inference in high dimensions where the covariate dimension

grows with the sample size

. It develops a multi-stage learning framework using doubly robust deep neural networks (DNNs) with generalized Neyman orthogonality to mitigate error propagation across stages, enabling valid inference even under sparse structural assumptions on

. The authors establish convergence rates for nested DNNs and by extension for multilayer perceptrons, derive theoretical guarantees for CATE and dynamic treatment effect (DTE) estimation in high-dimensional settings, and demonstrate cross-fitting strategies to attain valid asymptotic results. The approach yields robust, scalable nonparametric causal estimators that adapt to high dimensionality and complex regression structures, with practical guidance on architecture choices and nuisance-estimation strategies. Overall, the work advances theory for DNN-based multi-stage causal inference under growing

, and provides a foundation for applying these methods to heterogeneous treatment effects and sequential decision problems in large-scale observational studies.

Abstract

Deep neural networks (DNNs) have demonstrated remarkable empirical performance in large-scale supervised learning problems, particularly in scenarios where both the sample size

and the dimension of covariates

are large. This study delves into the application of DNNs across a wide spectrum of intricate causal inference tasks, where direct estimation falls short and necessitates multi-stage learning. Examples include estimating the conditional average treatment effect and dynamic treatment effect. In this framework, DNNs are constructed sequentially, with subsequent stages building upon preceding ones. To mitigate the impact of estimation errors from early stages on subsequent ones, we integrate DNNs in a doubly robust manner. In contrast to previous research, our study offers theoretical assurances regarding the effectiveness of DNNs in settings where the dimensionality

expands with the sample size. These findings are significant independently and extend to degenerate single-stage learning problems.

Paper Structure (27 sections, 15 theorems, 189 equations, 1 figure, 3 algorithms)

This paper contains 27 sections, 15 theorems, 189 equations, 1 figure, 3 algorithms.

Introduction
Organization
Notation
Multi-stage learning
Two-stage learning
Three-stage learning
Doubly robust nested DNNs
Construction of DNNs
DNNs through nested doubly robust regression
Heterogenous treatment effect estimation using DNNs
The doubly robust DNN estimate of CATE
Theoretical properties
Examples
Dynamic treatment effect estimation using DNNs
The sequential doubly robust estimate of DTE using DNNs
...and 12 more sections

Key Result

Theorem 3.1

Let $\widehat{Y}-Y^\#=\Delta_1+\Delta_2$ with some $\Delta_1=\Delta_1(\mathbf{Z};\mathbb{S}^2)$ and $\Delta_2=\Delta_2(\mathbf{Z};\mathbb{S}^2)$ satisfying Assume $\mathbf{X}\subseteq[-1,1]^p$, $\|RY^\#\|_\infty\leq M$, $\|R\Delta_1\|_{\infty,P_\mathbf{Z}}=O_p(1)$. Let $n\gg WL\log W$, where $L$ and $W$ denote the numbers of hidden layers and weights, respectively. Then, as $n\to\infty$,

Figures (1)

Figure 1: Illustrations of general feedforward networks (left) and multilayer perceptrons (right)

Theorems & Definitions (42)

Example 1: Average treatment effect
Example 2: Conditional average treatment effect
Example 3: Dynamic treatment effect with two exposures
Example 4: Controlled direct effect
Theorem 3.1
Remark 1: Nested doubly robust regression and generalized Neyman orthogonality
Theorem 3.2
Theorem 3.3
Remark 2: Convergence rate
Lemma 4.1
...and 32 more

Causal inference through multi-stage learning and doubly robust deep neural networks

TL;DR

Abstract

Causal inference through multi-stage learning and doubly robust deep neural networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (42)