Table of Contents
Fetching ...

Deep learning with missing data

Tianyi Ma, Tengyao Wang, Richard J. Samworth

TL;DR

This work addresses prediction with missing covariates in high-dimensional, nonparametric regression by introducing Pattern Embedded Neural Networks (PENNs), which integrate imputation-based learning with a dedicated network that encodes observation patterns (revelation vectors) and a fusion network that combines the two representations. The authors establish theoretical guarantees, including an oracle inequality and minimax-rate results under a piecewise compositional Hölder structure, showing that PENN achieves near-optimal rates across cells of the observation-pattern partition. They prove that the excess risk concentrates around a weighted average of cell-wise rates, with a poly-logarithmic factor, and provide a matching minimax lower bound; these results hold under general missingness mechanisms, including MNAR. Empirically, PENNs consistently outperform standard neural networks across simulated, semi-synthetic, and real data, often by large margins, and the method is compatible with any imputation technique, with code and tutorials publicly available. This approach offers a practical, theoretically-grounded solution for missing data in deep learning, demonstrating significant improvements in prediction when missingness is informative or pattern-dependent.

Abstract

In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions. Our main theoretical result exploits an assumption that the observation patterns can be partitioned into cells on which the Bayes regression function behaves similarly, and belongs to a compositional Hölder class. It provides a finite-sample excess risk bound that holds for an arbitrary missingness mechanism, and in combination with a complementary minimax lower bound, demonstrates that our PENN estimator attains in typical cases the minimax rate of convergence as if the cells of the partition were known in advance, up to a poly-logarithmic factor in the sample size. Numerical experiments on simulated, semi-synthetic and real data confirm that the PENN estimator consistently improves, often dramatically, on standard neural networks without pattern embedding. Code to reproduce our experiments, as well as a tutorial on how to apply our method, is publicly available.

Deep learning with missing data

TL;DR

This work addresses prediction with missing covariates in high-dimensional, nonparametric regression by introducing Pattern Embedded Neural Networks (PENNs), which integrate imputation-based learning with a dedicated network that encodes observation patterns (revelation vectors) and a fusion network that combines the two representations. The authors establish theoretical guarantees, including an oracle inequality and minimax-rate results under a piecewise compositional Hölder structure, showing that PENN achieves near-optimal rates across cells of the observation-pattern partition. They prove that the excess risk concentrates around a weighted average of cell-wise rates, with a poly-logarithmic factor, and provide a matching minimax lower bound; these results hold under general missingness mechanisms, including MNAR. Empirically, PENNs consistently outperform standard neural networks across simulated, semi-synthetic, and real data, often by large margins, and the method is compatible with any imputation technique, with code and tutorials publicly available. This approach offers a practical, theoretically-grounded solution for missing data in deep learning, demonstrating significant improvements in prediction when missingness is informative or pattern-dependent.

Abstract

In the context of multivariate nonparametric regression with missing covariates, we propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions. Our main theoretical result exploits an assumption that the observation patterns can be partitioned into cells on which the Bayes regression function behaves similarly, and belongs to a compositional Hölder class. It provides a finite-sample excess risk bound that holds for an arbitrary missingness mechanism, and in combination with a complementary minimax lower bound, demonstrates that our PENN estimator attains in typical cases the minimax rate of convergence as if the cells of the partition were known in advance, up to a poly-logarithmic factor in the sample size. Numerical experiments on simulated, semi-synthetic and real data confirm that the PENN estimator consistently improves, often dramatically, on standard neural networks without pattern embedding. Code to reproduce our experiments, as well as a tutorial on how to apply our method, is publicly available.

Paper Structure

This paper contains 35 sections, 14 theorems, 134 equations, 9 figures.

Key Result

Theorem 1

In the setting of Section sec:setup, assume that $\|Y_0\|_{\psi_2} \leq \xi$ for some $\xi\geq 1$. Let $\widetilde{f}$ be a neural network estimator in $\mathcal{F}\subseteq \mathcal{F}(L,\bm p, s)$ based on data $\mathcal{D} \coloneqq (\bm Z_i,\bm \Omega_i,Y_i)_{i=1}^n$ and let $B_n \coloneqq \xi\s

Figures (9)

  • Figure 1: An illustration of Example \ref{['example:f-star']}, and the outputs of neural networks trained without and with the revelation vectors.
  • Figure 2: An illustration of the class $\mathcal{F}_{\mathrm{PENN}}\biggl( (L_1,\bm{p}_1)$(L_3,\bm{p}_3)$(L_2,\bm{p}_2), s \biggr)$.
  • Figure 3: Estimates of excess risks for simulated data Models 1--4. The PENN estimators are shown in red, with the vanilla neural networks in blue; on the $x$-axis, the abbreviation of the imputation technique appears after the underscore symbol.
  • Figure 4: First 16 images (after mean imputation) from the MNIST dataset with MCAR and MNAR missingness, together with the true labels above each panel.
  • Figure 5: Misclassification error (MCE) for bank loan dataset with different missingness mechanisms.
  • ...and 4 more figures

Theorems & Definitions (35)

  • Example 1
  • Theorem 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Proposition 2
  • Theorem 3
  • Theorem 4
  • Definition 5
  • ...and 25 more