DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

Qinshuo Liu; Zixin Wang; Xi-An Li; Xinyao Ji; Lei Zhang; Lin Liu; Zhonghua Liu

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

Qinshuo Liu, Zixin Wang, Xi-An Li, Xinyao Ji, Lei Zhang, Lin Liu, Zhonghua Liu

TL;DR

DNA-SE introduces a bi-level optimization framework that uses Deep Neural Networks to numerically solve Fredholm integral equations arising in semiparametric estimation, enabling scalable estimation of a parameter of interest $\theta$ while handling nuisance components $\eta$. The method, implemented as neural-semipar, alternates between updating the operator-equation solution $\mathsf{b}$ via a DNN and updating $\theta$ via projected score equations, with a training loss that couples score fidelity and operator-solution accuracy. Through MNAR regression, causal-sensitivity analysis, and covariate-shift scenarios, the authors demonstrate competitive finite-sample performance, robustness to hyperparameters, and advantages over traditional polynomial/basis-based solvers; they also validate the approach on a real Connecticut CBCL dataset and release their code. The work provides a theoretical asymptotic result, showing $\sqrt{n}(\widehat{\theta}-\theta)$ converges to an efficient influence-function representation, and discusses connections to computerized semiparametric statistics and future extensions to graphical models and symbolic proofs. Overall, DNA-SE offers a scalable, data-driven numerical solver for semiparametric inference with practical impact for causal inference and missing-data problems.

Abstract

Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial or spline approximations, are difficult to scale to multi-dimensional problems. Alternatively, statisticians may choose to approximate the original integral equations by ones with closed-form solutions, resulting in computationally more efficient, but statistically suboptimal or even incorrect procedures. To bridge this gap, we propose a novel framework by formulating the semiparametric estimation problem as a bi-level optimization problem; and then we develop a scalable algorithm called Deep Neural-Nets Assisted Semiparametric Estimation (DNA-SE) by leveraging the universal approximation property of Deep Neural-Nets (DNN) to streamline semiparametric procedures. Through extensive numerical experiments and a real data analysis, we demonstrate the numerical and statistical advantages of $\dnase$ over traditional methods. To the best of our knowledge, we are the first to bring DNN into semiparametric statistics as a numerical solver of integral equations in our proposed general framework.

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

TL;DR

while handling nuisance components

. The method, implemented as neural-semipar, alternates between updating the operator-equation solution

via a DNN and updating

via projected score equations, with a training loss that couples score fidelity and operator-solution accuracy. Through MNAR regression, causal-sensitivity analysis, and covariate-shift scenarios, the authors demonstrate competitive finite-sample performance, robustness to hyperparameters, and advantages over traditional polynomial/basis-based solvers; they also validate the approach on a real Connecticut CBCL dataset and release their code. The work provides a theoretical asymptotic result, showing

converges to an efficient influence-function representation, and discusses connections to computerized semiparametric statistics and future extensions to graphical models and symbolic proofs. Overall, DNA-SE offers a scalable, data-driven numerical solver for semiparametric inference with practical impact for causal inference and missing-data problems.

Abstract

over traditional methods. To the best of our knowledge, we are the first to bring DNN into semiparametric statistics as a numerical solver of integral equations in our proposed general framework.

Paper Structure (33 sections, 1 theorem, 36 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 1 theorem, 36 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Semiparametric statistics, operator equations and the general setup
A brief review of semiparametric statistics
Problem setup
The algorithm
Simulation studies
Example 1: Estimating parameters under missing-not-at-random ($\texttt{MNAR}$)
tunning of hyperparameters
estimation
Example 2: Sensitivity analysis in causal inference
tunning of hyperparameters
estimation
$\alpha$
Covariate shift posterior drift
Tunning of hyperparameters
...and 18 more sections

Key Result

Theorem 4

$\widehat{\theta}$ is the output of ${\normalfont\texttt{neural-semipar}}$ described in Algorithm alg:fred. Then under the regularity conditions given in Appendix app:regular, we have where $\nu^{2}$ is the semiparametric efficiency bound.

Figures (6)

Figure 1: The boxplot of estimation of $\beta_0$ and $\beta_1$ with neural network method and different orders polynomial expansion methods comparing with oracle parameter
Figure 2: The boxplot of estimation of $\beta$ with neural network method and different orders polynomial expansion methods comparing with oracle parameter. Where (b) is the result of (a) being stretched
Figure 3: The boxplot displays the estimates of $\sqrt{n} (\widehat{r} - r_*)$ obtained using a neural network method and polynomial methods of varying orders, where $\widehat{r}$ represents the estimated value of $r_*$. The neural network method is denoted as $nn$ and utilizes the ${\normalfont\texttt{neural-semipar}}$ algorithm, while the polynomial methods are represented as $poly_i$, where $i$ denotes the order of expansion.
Figure 4: Density histograms corresponding to values $\widehat{\beta}_0,\widehat{\beta}_1,\widehat{\beta}_2$ were estimated by different methods.
Figure 5: The evolution of training processes in one simulated dataset of Example \ref{['sec:mnar']} by setting the alternating frequency $\lambda$ as $1, 5, 10$, or $20$. $\mathsf{x}$-axis is the number of iterations. In both (a) and (b) (which is (a) without $\lambda=1$), the upper panels show the evolution of $\beta_{0, 1}$ (left) and $\beta_{0, 2}$ (right) over iterations, while the lower panels show the evolution of the loss corresponding to the score equation (left) and the Fredholm integral equation (right) over iterations.
...and 1 more figures

Theorems & Definitions (10)

Example 1
Example 2
Example 3
Example 1: continued
Example 2: continued
Example 3: continued
Remark 1
Remark 3
Theorem 4
Remark 5

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

TL;DR

Abstract

DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)