Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Badr-Eddine Chérief-Abdellatif; Jeffrey Näf

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Badr-Eddine Chérief-Abdellatif, Jeffrey Näf

Abstract

Missing values are ubiquitous in (data) science, with potential detrimental consequences for any statistical analysis. As a consequence, a wealth of methods and theoretical results have been developed in recent years. Still, many questions remain open, in particular in the case of general non-monotone missing at random (MAR). In this work, we extend nonparametric Bayesian theory to this MAR setting. We introduce a general theorem of posterior contraction under MAR and an additional mild positivity condition. Using this result, we are able to show that, despite the missing values, the density of the uncontaminated data can be estimated with the minimax posterior contraction rate up to log factors. To the best of our knowledge, this is the first nonparametric result showing that the uncontaminated distribution can be consistently estimated under Rubin's MAR definition. As a consequence, we obtain an algorithm that takes data contaminated with missing values and returns a sample from a provably consistent estimate of the uncontaminated distribution.

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Abstract

Paper Structure (20 sections, 16 theorems, 149 equations, 5 figures, 6 algorithms)

This paper contains 20 sections, 16 theorems, 149 equations, 5 figures, 6 algorithms.

Introduction
Background and Notation
Missingness at Random
Ignorability of likelihood-based inference under MAR
Notation
Problem Statement and Contributions
Related Literature
Contributions
Posterior concentration rates under MAR
Definitions
General Posterior Contraction Results
Posterior Contraction Results in Hellinger distance
Density Estimation under MAR missingness
Motivating Example
Density Estimation on ${\mathbb R}^d$
...and 5 more sections

Key Result

Proposition 4.0

Under Assumptions asm_true_MDM and asm_no_empty_MDM, the KL divergence relative to $\mathbb{P}_{\theta^*}$ satisfies and In particular $\theta \mapsto \widetilde{\textnormal{KL}}(P_{\theta^*} \| P_{\theta})$ is minimized at $\theta^*$.

Figures (5)

Figure 1: Three Data matrices with missing values, each with three different patterns. Each contains the fully observed pattern $M=0$ and does not contain the completely unobserved pattern $M\neq \mathbbm{1}$.
Figure 2: Mean of $X_1$ (Left) and correlation of $X_1,X_2$ (Right) estimation under the MAR mechanism in \ref{['eq_MARmissing0']}.
Figure 3: Example with $P_{\theta^*}$ chosen to be $\mathcal{N}\left(0,\Sigma\right)$. Top: Quantile Estimate of $X_1$ for $n=500$ (left) and $n=1000$ (right), Bottom: negative energy distance between the newly generated sample/imputation and the full data for $n=500$ (left) and $n=1000$ (right).
Figure 4: Example with $P_{\theta^*}$ chosen to be a mixture of Gaussians with different means and a correlation of $0.7$ between $X_1$ and $X_2$. Top: Quantile Estimate of $X_1$ for $n=500$ (left) and $n=1000$ (right), Bottom: negative energy distance between the newly generated sample/imputation and the full data for $n=500$ (left) and $n=1000$ (right).
Figure 5: Example with $P_{\theta^*}$ chosen to be uniform on $[0,1]^3$ with correlation between $X_1$ and $X_2$ induced by a copula. Left: Quantile Estimate of $X_1$, Right: negative energy distance between the newly generated sample/imputation and the full data. We used $n=1000$.

Theorems & Definitions (27)

Definition 4.1: KL relative to $\mathbb{P}_{\theta^*}$
Proposition 4.0: KL relative to $\mathbb{P}_{\theta^*}$
Proposition 4.0: $\KLMAR$ Divergence Nature
Theorem 4.1: General Rate Result
Theorem 4.2
Proposition 4.3
Theorem 5.1: Density Estimation Rate
Proposition B.0: Hellinger bounds
proof
Proposition B.0: KL relative to $\mathbb{P}_{\theta^*}$
...and 17 more

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Abstract

Asymptotics of Nonparametric Estimation under general non-monotone MAR missingness: A Bayesian Approach

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (27)