Consistent Empirical Bayes Estimation of the Mean of a Mixing Distribution with Applications to Treatment of Nonresponse
Eitan Greenshtein
TL;DR
This work addresses the challenge of estimating functionals η_G = E_G[η(θ)] in nonparametric Empirical Bayes settings when the mixing distribution G may be non-identifiable, particularly under MNAR nonresponse. It advocates using the Generalized Maximum Likelihood Estimator (GMLE) for G and analyzes the performance of η_{ ilde G} = E_{ ilde G}[η(θ)], showing that η_{ ilde G} can be consistently estimated for η_G even when G is not identifiable. The authors establish asymptotic uniqueness of η_{ ilde G} across GMLEs in both truncated and censored data frameworks and prove convergence of η_{ ilde G} to η_G under mild, MAR-like assumptions without requiring identifiability. Numerical experiments illustrate robust performance of the GMLE-based estimator in MNAR settings, and the paper discusses extensions to weighted averages and covariate incorporation via super-population constructs. Overall, the results offer a principled approach to correcting nonresponse bias in complex data while relying on functionals of the latent mixing distribution rather than identifiability of G.
Abstract
We consider a Nonparametric Empirical Bayes (NPEB) framework. Let $Y_i$ be random variables, $Y_i \sim f(y|θ_i)$, $i=1,...,n$, where $θ_i \sim G$, and $θ_i \in Θ$ are independent. The variables $Y_i $ are conditionally independent given $θ_i, \; i=1,...,n$. The mixing distribution $G$ is unknown and assumed to belong to a nonparametric class $\{G \}$. Let $η(θ)$ be a function of $θ$. We address the problem of consistently estimating $E_G η(θ) \equiv η_G$. This problem becomes particularly challenging when $G$ cannot be consistently estimated from the observed data. We motivate this problem, especially in contexts involving nonresponse and missing data. For such cases, a consistent estimation method is suggested and its performance is demonstrated through simulations.
