Neural-g: A Deep Learning Framework for Mixing Density Estimation
Shijie Wang, Saptarshi Chakraborty, Qian Qin, Ray Bai
TL;DR
This paper introduces neural-$g$, a deep neural network framework for estimating mixing densities in empirical Bayes $g$-modeling. By outputting a valid probability mass function over a fixed grid $\Theta_m$ via a softmax layer, neural-$g$ flexibly captures a wide range of priors, including discrete, continuous, flat, heavy-tailed, and bounded shapes. The authors establish a universal approximation theorem for PMFs with a softmax output, propose a weighted average gradient optimizer to accelerate training, and extend the method to multivariate priors. Through extensive simulations and real-data applications (Poisson mixtures and measurement-error models), neural-$g$ demonstrates competitive accuracy and reliable uncertainty quantification, often outperforming NPMLE and Efron’s $g$ when priors are non-smooth or contain flat regions. A publicly available software package supports implementation, enabling broad use in empirical Bayes and latent-mixture problems.
Abstract
Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG.
