Neural-g: A Deep Learning Framework for Mixing Density Estimation

Shijie Wang; Saptarshi Chakraborty; Qian Qin; Ray Bai

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Shijie Wang, Saptarshi Chakraborty, Qian Qin, Ray Bai

TL;DR

This paper introduces neural-$g$, a deep neural network framework for estimating mixing densities in empirical Bayes $g$-modeling. By outputting a valid probability mass function over a fixed grid $\Theta_m$ via a softmax layer, neural-$g$ flexibly captures a wide range of priors, including discrete, continuous, flat, heavy-tailed, and bounded shapes. The authors establish a universal approximation theorem for PMFs with a softmax output, propose a weighted average gradient optimizer to accelerate training, and extend the method to multivariate priors. Through extensive simulations and real-data applications (Poisson mixtures and measurement-error models), neural-$g$ demonstrates competitive accuracy and reliable uncertainty quantification, often outperforming NPMLE and Efron’s $g$ when priors are non-smooth or contain flat regions. A publicly available software package supports implementation, enabling broad use in empirical Bayes and latent-mixture problems.

Abstract

Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG.

Neural-g: A Deep Learning Framework for Mixing Density Estimation

TL;DR

This paper introduces neural-

, a deep neural network framework for estimating mixing densities in empirical Bayes

-modeling. By outputting a valid probability mass function over a fixed grid

via a softmax layer, neural-

flexibly captures a wide range of priors, including discrete, continuous, flat, heavy-tailed, and bounded shapes. The authors establish a universal approximation theorem for PMFs with a softmax output, propose a weighted average gradient optimizer to accelerate training, and extend the method to multivariate priors. Through extensive simulations and real-data applications (Poisson mixtures and measurement-error models), neural-

demonstrates competitive accuracy and reliable uncertainty quantification, often outperforming NPMLE and Efron’s

when priors are non-smooth or contain flat regions. A publicly available software package supports implementation, enabling broad use in empirical Bayes and latent-mixture problems.

Abstract

Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes

-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-

, a new neural network-based estimator for

-modeling. Neural-

uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-

is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-

by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-

to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-

is publicly available at https://github.com/shijiew97/neuralG.

Paper Structure (20 sections, 2 theorems, 39 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 39 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
The Neural-$g$ Estimator
Motivating Examples
Introducing Neural-$g$
Neural-$g$'s Approximation Capability
Related Work
Implementation of Neural-$g$
Synthetic Illustrations
Estimation Performance
Uncertainty Quantification Performance
Multivariate Neural-$g$
Real Data Applications
Poisson Mixture Examples
Measurement Error Example
Conclusion
...and 5 more sections

Key Result

Theorem 2.1

Let $p(\theta)$ be an arbitrary PMF defined on ${\Theta_m} = \{ \theta_1, \ldots, \theta_m \}$, and let $p(\Theta_m) = ( p(\theta_1), \ldots, p(\theta_m))^\top$ denote the probability vector where $p(\theta_j) = \mathbb{P}(\theta = \theta_j), j = 1, \ldots, m$. Meanwhile, let $g_{\phi}(\Theta_m) = (

Figures (8)

Figure 1: Performance of different estimators when the true mixing density is a point mass prior (left two panels) or a Gaussian prior (right two panels). The top two panels plot the estimated densities, while the bottom two panels plot the estimated CDFs.
Figure 2: Left panel: Estimated priors from one replication of Simulation I (Uniform prior). Right panel: Estimated priors from one replication of Simulation II (Piecewise constant prior).
Figure 3: Estimated priors from one replication of Simulation III (Heavy-tailed prior). Left panel: The estimated densities around the mode. Right panel: The upper tails of the estimated densities.
Figure 4: Left panel: Estimated priors from one replication of Simulation IV (Bounded prior). Right panel: Estimated priors from one replication of Simulation V (Point mass prior). In the left panel, NPMLE is not plotted because the REbayes package does not support mixture models with a log-normal density.
Figure 5: Left two panels: Plots of the estimated marginal densities for $\mu$ and $\sigma^2$ under two different Gaussian location-scale mixture models. Right two panels: Heatmaps and contour plots from 5000 samples of the estimated joint density $\widehat{\pi}_{\text{neural-}g}(\mu, \sigma^2)$ vs. the true $\pi(\mu, \sigma^2)$.
...and 3 more figures

Theorems & Definitions (6)

Theorem 2.1
proof
Theorem 2.2
proof
proof
proof

Neural-g: A Deep Learning Framework for Mixing Density Estimation

TL;DR

Abstract

Neural-g: A Deep Learning Framework for Mixing Density Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)