Table of Contents
Fetching ...

Deep non-parametric logistic model with case-control data and external summary information

Hengchao Shi, Ming Zheng, Wen Yu

TL;DR

This work considers the estimation of a non-parametric logistic model with the case-control data supplemented by external summary information and derives the non-asymptotic error bound of the proposed estimator.

Abstract

The case-control sampling design serves as a pivotal strategy in mitigating the imbalanced structure observed in binary data. We consider the estimation of a non-parametric logistic model with the case-control data supplemented by external summary information. The incorporation of external summary information ensures the identifiability of the model. We propose a two-step estimation procedure. In the first step, the external information is utilized to estimate the marginal case proportion. In the second step, the estimated proportion is used to construct a weighted objective function for parameter training. A deep neural network architecture is employed for functional approximation. We further derive the non-asymptotic error bound of the proposed estimator. Following this the convergence rate is obtained and is shown to reach the optimal speed of the non-parametric regression estimation. Simulation studies are conducted to evaluate the theoretical findings of the proposed method. A real data example is analyzed for illustration.

Deep non-parametric logistic model with case-control data and external summary information

TL;DR

This work considers the estimation of a non-parametric logistic model with the case-control data supplemented by external summary information and derives the non-asymptotic error bound of the proposed estimator.

Abstract

The case-control sampling design serves as a pivotal strategy in mitigating the imbalanced structure observed in binary data. We consider the estimation of a non-parametric logistic model with the case-control data supplemented by external summary information. The incorporation of external summary information ensures the identifiability of the model. We propose a two-step estimation procedure. In the first step, the external information is utilized to estimate the marginal case proportion. In the second step, the estimated proportion is used to construct a weighted objective function for parameter training. A deep neural network architecture is employed for functional approximation. We further derive the non-asymptotic error bound of the proposed estimator. Following this the convergence rate is obtained and is shown to reach the optimal speed of the non-parametric regression estimation. Simulation studies are conducted to evaluate the theoretical findings of the proposed method. A real data example is analyzed for illustration.
Paper Structure (12 sections, 9 theorems, 69 equations, 2 figures, 3 tables)

This paper contains 12 sections, 9 theorems, 69 equations, 2 figures, 3 tables.

Key Result

Theorem 1

Under Assumption 4 and Assumption 5, we have that i) $\hat{P}_1\xrightarrow{p}P_1$ as $n\to\infty$, where $\xrightarrow{p}$ stands for converging in probability, and ii) $\sqrt{n}(\hat{P}_1 - P_1) \xrightarrow{d} N(0,a_1^2 V_1 + a_2^2 V_0 + a_3^2 V)$ as $n\to\infty$, where $a_1, a_2, a_3, V_0$, an

Figures (2)

  • Figure 1: The average of $\widehat{g}_n(x)$ and $\widetilde{g}_n(x)$. The black line corresponds to the truce value of $g(x)$; the red dotted line corresponds to the average of $\widehat{g}_n(x)$; the blue dotted line corresponds to the average of $\widetilde{g}_n(x)$
  • Figure 2: The scatter plots of the case-control data estimator against full data estimator. The left panel is for $\widehat{g}_n(x)$ and the right panel is for $\widetilde{g}_n(x)$. The three rows correspond to three case and control sizes.

Theorems & Definitions (9)

  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6