Table of Contents
Fetching ...

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Shijie Wang, Saptarshi Chakraborty, Qian Qin, Ray Bai

TL;DR

This paper introduces neural-$g$, a deep neural network framework for estimating mixing densities in empirical Bayes $g$-modeling. By outputting a valid probability mass function over a fixed grid $\Theta_m$ via a softmax layer, neural-$g$ flexibly captures a wide range of priors, including discrete, continuous, flat, heavy-tailed, and bounded shapes. The authors establish a universal approximation theorem for PMFs with a softmax output, propose a weighted average gradient optimizer to accelerate training, and extend the method to multivariate priors. Through extensive simulations and real-data applications (Poisson mixtures and measurement-error models), neural-$g$ demonstrates competitive accuracy and reliable uncertainty quantification, often outperforming NPMLE and Efron’s $g$ when priors are non-smooth or contain flat regions. A publicly available software package supports implementation, enabling broad use in empirical Bayes and latent-mixture problems.

Abstract

Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG.

Neural-g: A Deep Learning Framework for Mixing Density Estimation

TL;DR

This paper introduces neural-, a deep neural network framework for estimating mixing densities in empirical Bayes -modeling. By outputting a valid probability mass function over a fixed grid via a softmax layer, neural- flexibly captures a wide range of priors, including discrete, continuous, flat, heavy-tailed, and bounded shapes. The authors establish a universal approximation theorem for PMFs with a softmax output, propose a weighted average gradient optimizer to accelerate training, and extend the method to multivariate priors. Through extensive simulations and real-data applications (Poisson mixtures and measurement-error models), neural- demonstrates competitive accuracy and reliable uncertainty quantification, often outperforming NPMLE and Efron’s when priors are non-smooth or contain flat regions. A publicly available software package supports implementation, enabling broad use in empirical Bayes and latent-mixture problems.

Abstract

Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes -modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-, a new neural network-based estimator for -modeling. Neural- uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural- is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural- by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural- to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural- is publicly available at https://github.com/shijiew97/neuralG.
Paper Structure (20 sections, 2 theorems, 39 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 39 equations, 8 figures, 5 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $p(\theta)$ be an arbitrary PMF defined on ${\Theta_m} = \{ \theta_1, \ldots, \theta_m \}$, and let $p(\Theta_m) = ( p(\theta_1), \ldots, p(\theta_m))^\top$ denote the probability vector where $p(\theta_j) = \mathbb{P}(\theta = \theta_j), j = 1, \ldots, m$. Meanwhile, let $g_{\phi}(\Theta_m) = (

Figures (8)

  • Figure 1: Performance of different estimators when the true mixing density is a point mass prior (left two panels) or a Gaussian prior (right two panels). The top two panels plot the estimated densities, while the bottom two panels plot the estimated CDFs.
  • Figure 2: Left panel: Estimated priors from one replication of Simulation I (Uniform prior). Right panel: Estimated priors from one replication of Simulation II (Piecewise constant prior).
  • Figure 3: Estimated priors from one replication of Simulation III (Heavy-tailed prior). Left panel: The estimated densities around the mode. Right panel: The upper tails of the estimated densities.
  • Figure 4: Left panel: Estimated priors from one replication of Simulation IV (Bounded prior). Right panel: Estimated priors from one replication of Simulation V (Point mass prior). In the left panel, NPMLE is not plotted because the REbayes package does not support mixture models with a log-normal density.
  • Figure 5: Left two panels: Plots of the estimated marginal densities for $\mu$ and $\sigma^2$ under two different Gaussian location-scale mixture models. Right two panels: Heatmaps and contour plots from 5000 samples of the estimated joint density $\widehat{\pi}_{\text{neural-}g}(\mu, \sigma^2)$ vs. the true $\pi(\mu, \sigma^2)$.
  • ...and 3 more figures

Theorems & Definitions (6)

  • Theorem 2.1
  • proof
  • Theorem 2.2
  • proof
  • proof
  • proof