Table of Contents
Fetching ...

Causal-StoNet: Causal Inference for High-Dimensional Complex Data

Yaxin Fang, Faming Liang

TL;DR

A novel causal inference approach based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature that can address both the high dimensionality and unknown data generation process in a coherent way.

Abstract

With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medicine, econometrics, and social science. However, the existing methods for causal inference are frequently developed under the assumption that the data dimension is low or that the underlying data generation process is linear or approximately linear. To address these challenges, this paper proposes a novel causal inference approach for dealing with high-dimensional complex data. The proposed approach is based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature. By using these techniques, the proposed approach can address both the high dimensionality and unknown data generation process in a coherent way. Furthermore, the proposed approach can also be used when missing values are present in the datasets. Extensive numerical studies indicate that the proposed approach outperforms existing ones.

Causal-StoNet: Causal Inference for High-Dimensional Complex Data

TL;DR

A novel causal inference approach based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature that can address both the high dimensionality and unknown data generation process in a coherent way.

Abstract

With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medicine, econometrics, and social science. However, the existing methods for causal inference are frequently developed under the assumption that the data dimension is low or that the underlying data generation process is linear or approximately linear. To address these challenges, this paper proposes a novel causal inference approach for dealing with high-dimensional complex data. The proposed approach is based on deep learning techniques, including sparse deep learning theory and stochastic neural networks, that have been developed in recent literature. By using these techniques, the proposed approach can address both the high dimensionality and unknown data generation process in a coherent way. Furthermore, the proposed approach can also be used when missing values are present in the datasets. Extensive numerical studies indicate that the proposed approach outperforms existing ones.
Paper Structure (46 sections, 10 theorems, 39 equations, 2 figures, 7 tables, 1 algorithm)

This paper contains 46 sections, 10 theorems, 39 equations, 2 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Assume that the mixture Gaussian prior (marprior) is imposed on each connection of the StoNet, Assumptions StoNetAss:1--sparseDNN:Ass2 hold, and $r_n \prec n^{3/16}$. As $n \to \infty$, the following results hold:

Figures (2)

  • Figure 1: Causal-StoNet Structure: the treatment is included as a visible unit (rectangle) in a middle layer, and ${\boldsymbol Y}_2$ denotes the latent variable of that layer but with the unit directly feeding to the treatment rectangle excluded; 'x' represents possible missing values.
  • Figure 2: In-sample MAE and Out-of-Sample MAE of ATE estimation with varying training sample sizes. In-sample MAE is calculated over training and validation sets, Out-of-Sample MAE is calculated over test set

Theorems & Definitions (13)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • Lemma A1
  • Lemma A2
  • Lemma A3
  • Lemma A4
  • proof
  • Lemma A5
  • ...and 3 more