Table of Contents
Fetching ...

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

Jake A. Soloff, Adityanand Guntuboyina, Bodhisattva Sen

TL;DR

An oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoizing without prior knowledge is proved.

Abstract

Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set.

Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood

TL;DR

An oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoizing without prior knowledge is proved.

Abstract

Multivariate, heteroscedastic errors complicate statistical inference in many large-scale denoising problems. Empirical Bayes is attractive in such settings, but standard parametric approaches rest on assumptions about the form of the prior distribution which can be hard to justify and which introduce unnecessary tuning parameters. We extend the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixture densities to allow for multivariate, heteroscedastic errors. NPMLEs estimate an arbitrary prior by solving an infinite-dimensional, convex optimization problem; we show that this convex optimization problem can be tractably approximated by a finite-dimensional version. The empirical Bayes posterior means based on an NPMLE have low regret, meaning they closely target the oracle posterior means one would compute with the true prior in hand. We prove an oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge. We provide finite-sample bounds on the average Hellinger accuracy of an NPMLE for estimating the marginal densities of the observations. We also demonstrate the adaptive and nearly-optimal properties of NPMLEs for deconvolution. We apply our method to two denoising problems in astronomy, constructing a fully data-driven color-magnitude diagram of 1.4 million stars in the Milky Way and investigating the distribution of 19 chemical abundance ratios for 27 thousand stars in the red clump. We also apply our method to hierarchical linear models, illustrating the advantages of nonparametric shrinkage of regression coefficients on an education data set and on a microarray data set.

Paper Structure

This paper contains 39 sections, 16 theorems, 188 equations, 10 figures, 1 algorithm.

Key Result

Lemma 1

Problem eq-NPMLE attains its maximum: there exists a discrete solution ${\widehat{G}_n}$ with at most $n$ atoms, and the vector $\hat{L} \equiv (\hat{L}_1,\dots,\hat{L}_n)= (f_{{\widehat{G}_n}, \Sigma_i}(X_i))_{i=1}^n$ of fitted likelihood values is unique. Moreover, ${\widehat{G}_n}\in {\mathcal{P} The support of any ${\widehat{G}_n}$ is contained in the zero set $\mathcal{Z} \coloneqq \{\varthet

Figures (10)

  • Figure 1: A noisy color-magnitude diagram (CMD) corresponding to the observations $X_i$ in model \ref{['eq-obs-model']}, with corresponding fully-nonparametric denoised estimates $\hat{\theta}_i$ in the right panel. To avoid overplotting, we display a subsample of $n=10^5$ stars.
  • Figure 2: Toy data of size $n=1,000$ and $d=2$. Top: observations $X_i$ (left) were generated by adding heteroscedastic Gaussian errors to the underlying means $\theta^*_i\stackrel{\text{i.i.d.}}{\sim} G^*$ (right), generated i.i.d. uniformly from a circle of radius $2$. Our discrete estimate ${\widehat{G}_n}$ of the prior is shown in red over the prior $G^*$ in black. Bottom: a comparison of oracle Bayes $\hat{\theta}^*_i$ (left) based on knowledge of the prior distribution $G^*$ and empirical Bayes $\hat{\theta}_i$ (right), a function of the observed data.
  • Figure 3: Level sets of the dual mixture density $\widehat{\psi}_n = f_{H, \sigma^2I_2}$ where $n=3$ and $H = \frac{1}{3}\sum_{i=1}^3 \delta_{X_i}$ is uniform over the vertices of the larger equilateral triangle $\triangle X_1X_2X_3$. With $\sigma^2 = \frac{3}{\log 256}$, the dual mixture density $\widehat{\psi}_n$ has four global modes.
  • Figure 4: Left: An example of observations (blue points) $X_1 = (0, 1)$, $X_2 = (0, -1)$, $X_3 = (1, 0)$, and $X_4 = (-1, 0)$ with diagonal covariances (dashed ellipses) $\Sigma_1=\Sigma_2 = 500.05$ and $\Sigma_3=\Sigma_4 = .05005,$ where the NPMLE is supported on atoms (red points) $a_1,\dots,a_4$ well outside the convex hull of the data, and near the corners of the minimum axis-aligned bounding box. Right: The mixture $\widehat{\psi}_n(\vartheta) = \frac{1}{4}\sum_{i=1}^4 \varphi_{\Sigma_i}(X_i - \vartheta)$ only has modes at the atoms $a_1,\dots,a_4$, so no NPMLE is supported within the convex hull of the data.
  • Figure 5: Left: An example of observations (blue x's) $X_1 = (0, 1)$, $X_2 = (\frac{\sqrt{3}}{2}, -\frac{1}{2})$, and $X_3 = (-\frac{\sqrt{3}}{2}, -\frac{1}{2})$ with covariances (dashed ellipses) $\Sigma_1=4001$, $\Sigma_2=1.75\frac{3\sqrt{3}}{4}\frac{3\sqrt{3}}{4}3.25$ and $\Sigma_3=1.75-\frac{3\sqrt{3}}{4}-\frac{3\sqrt{3}}{4}3.25$, where the NPMLE is supported on atoms (red $\blacktriangle$'s). Blue $\bullet$'s represent atoms $x^*(\alpha)$ sampled from the ridgeline manifold $\mathcal{M}$ with $\alpha$ being $2$-sparse, and blue $\star$'s represent the same with $\alpha$ being $3$-sparse. Middle: same plot but each $\Sigma_i$ is scaled by $4$. Right: same plot but each $\Sigma_i$ is scaled by $8$.
  • ...and 5 more figures

Theorems & Definitions (33)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Proposition 4
  • Lemma 5
  • Proposition 6
  • Theorem 7
  • Corollary 8
  • Theorem 9
  • Remark 10
  • ...and 23 more