DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation
Qinshuo Liu, Zixin Wang, Xi-An Li, Xinyao Ji, Lei Zhang, Lin Liu, Zhonghua Liu
TL;DR
DNA-SE introduces a bi-level optimization framework that uses Deep Neural Networks to numerically solve Fredholm integral equations arising in semiparametric estimation, enabling scalable estimation of a parameter of interest $\theta$ while handling nuisance components $\eta$. The method, implemented as neural-semipar, alternates between updating the operator-equation solution $\mathsf{b}$ via a DNN and updating $\theta$ via projected score equations, with a training loss that couples score fidelity and operator-solution accuracy. Through MNAR regression, causal-sensitivity analysis, and covariate-shift scenarios, the authors demonstrate competitive finite-sample performance, robustness to hyperparameters, and advantages over traditional polynomial/basis-based solvers; they also validate the approach on a real Connecticut CBCL dataset and release their code. The work provides a theoretical asymptotic result, showing $\sqrt{n}(\widehat{\theta}-\theta)$ converges to an efficient influence-function representation, and discusses connections to computerized semiparametric statistics and future extensions to graphical models and symbolic proofs. Overall, DNA-SE offers a scalable, data-driven numerical solver for semiparametric inference with practical impact for causal inference and missing-data problems.
Abstract
Semiparametric statistics play a pivotal role in a wide range of domains, including but not limited to missing data, causal inference, and transfer learning, to name a few. In many settings, semiparametric theory leads to (nearly) statistically optimal procedures that yet involve numerically solving Fredholm integral equations of the second kind. Traditional numerical methods, such as polynomial or spline approximations, are difficult to scale to multi-dimensional problems. Alternatively, statisticians may choose to approximate the original integral equations by ones with closed-form solutions, resulting in computationally more efficient, but statistically suboptimal or even incorrect procedures. To bridge this gap, we propose a novel framework by formulating the semiparametric estimation problem as a bi-level optimization problem; and then we develop a scalable algorithm called Deep Neural-Nets Assisted Semiparametric Estimation (DNA-SE) by leveraging the universal approximation property of Deep Neural-Nets (DNN) to streamline semiparametric procedures. Through extensive numerical experiments and a real data analysis, we demonstrate the numerical and statistical advantages of $\dnase$ over traditional methods. To the best of our knowledge, we are the first to bring DNN into semiparametric statistics as a numerical solver of integral equations in our proposed general framework.
