Table of Contents
Fetching ...

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Badr-Eddine Chérief-Abdellatif, Jeffrey Näf

Abstract

Missing values are ubiquitous in (data) science, with potential detrimental consequences for any statistical analysis. As a consequence, a wealth of methods and theoretical results have been developed in recent years. Still, many questions remain open, in particular in the case of general non-monotone missing at random (MAR). In this work, we extend nonparametric Bayesian theory to this MAR setting. We introduce a general theorem of posterior contraction under MAR and an additional mild positivity condition. Using this result, we are able to show that, despite the missing values, the density of the uncontaminated data can be estimated with the minimax posterior contraction rate up to log factors. To the best of our knowledge, this is the first nonparametric result showing that the uncontaminated distribution can be consistently estimated under Rubin's MAR definition. As a consequence, we obtain an algorithm that takes data contaminated with missing values and returns a sample from a provably consistent estimate of the uncontaminated distribution.

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Abstract

Missing values are ubiquitous in (data) science, with potential detrimental consequences for any statistical analysis. As a consequence, a wealth of methods and theoretical results have been developed in recent years. Still, many questions remain open, in particular in the case of general non-monotone missing at random (MAR). In this work, we extend nonparametric Bayesian theory to this MAR setting. We introduce a general theorem of posterior contraction under MAR and an additional mild positivity condition. Using this result, we are able to show that, despite the missing values, the density of the uncontaminated data can be estimated with the minimax posterior contraction rate up to log factors. To the best of our knowledge, this is the first nonparametric result showing that the uncontaminated distribution can be consistently estimated under Rubin's MAR definition. As a consequence, we obtain an algorithm that takes data contaminated with missing values and returns a sample from a provably consistent estimate of the uncontaminated distribution.
Paper Structure (20 sections, 16 theorems, 149 equations, 5 figures, 6 algorithms)

This paper contains 20 sections, 16 theorems, 149 equations, 5 figures, 6 algorithms.

Key Result

Proposition 4.0

Under Assumptions asm_true_MDM and asm_no_empty_MDM, the KL divergence relative to $\mathbb{P}_{\theta^*}$ satisfies and In particular $\theta \mapsto \widetilde{\textnormal{KL}}(P_{\theta^*} \| P_{\theta})$ is minimized at $\theta^*$.

Figures (5)

  • Figure 1: Three Data matrices with missing values, each with three different patterns. Each contains the fully observed pattern $M=0$ and does not contain the completely unobserved pattern $M\neq \mathbbm{1}$.
  • Figure 2: Mean of $X_1$ (Left) and correlation of $X_1,X_2$ (Right) estimation under the MAR mechanism in \ref{['eq_MARmissing0']}.
  • Figure 3: Example with $P_{\theta^*}$ chosen to be $\mathcal{N}\left(0,\Sigma\right)$. Top: Quantile Estimate of $X_1$ for $n=500$ (left) and $n=1000$ (right), Bottom: negative energy distance between the newly generated sample/imputation and the full data for $n=500$ (left) and $n=1000$ (right).
  • Figure 4: Example with $P_{\theta^*}$ chosen to be a mixture of Gaussians with different means and a correlation of $0.7$ between $X_1$ and $X_2$. Top: Quantile Estimate of $X_1$ for $n=500$ (left) and $n=1000$ (right), Bottom: negative energy distance between the newly generated sample/imputation and the full data for $n=500$ (left) and $n=1000$ (right).
  • Figure 5: Example with $P_{\theta^*}$ chosen to be uniform on $[0,1]^3$ with correlation between $X_1$ and $X_2$ induced by a copula. Left: Quantile Estimate of $X_1$, Right: negative energy distance between the newly generated sample/imputation and the full data. We used $n=1000$.

Theorems & Definitions (27)

  • Definition 4.1: KL relative to $\mathbb{P}_{\theta^*}$
  • Proposition 4.0: KL relative to $\mathbb{P}_{\theta^*}$
  • Proposition 4.0: $\KLMAR$ Divergence Nature
  • Theorem 4.1: General Rate Result
  • Theorem 4.2
  • Proposition 4.3
  • Theorem 5.1: Density Estimation Rate
  • Proposition B.0: Hellinger bounds
  • proof
  • Proposition B.0: KL relative to $\mathbb{P}_{\theta^*}$
  • ...and 17 more