Table of Contents
Fetching ...

Convergence rates for estimating multivariate scale mixtures of uniform densities

Arlene K. H. Kim, Gil Kur, Adityanand Guntuboyina

TL;DR

This work extends the Grenander density-estimation paradigm to multivariate scale mixtures of uniforms, proving that the multivariate SMU-MLE $\hat p_{n,d}^{\mathrm{SMU}}$ attains a Hellinger risk of order $n^{-2/3}$ up to a dimension-dependent log factor $\gamma_d=4d-2$, under compact support, an upper bound, and a lower-bound relaxing assumption on $p_0$. A general Hellinger-accuracy theorem for MLEs over convex density classes (Theorem hellexp) reduces estimation risk to empirical-process bounds and entropy, a core methodological advance. Bracketing-entropy bounds for SMU subclasses, derived via distribution-function entropies of nonnegative measures, enable the main rates, a minimax lower bound $\inf_{\tilde p_n}\sup_{p_0} \mathbb E h^2(p_0,\tilde p_n) \ge c_d n^{-2/3}(\log n)^{(d-1)/3}$, and adaptive results when $p_0$ is piecewise constant on rectangles. Computationally, the authors provide exact and approximate SMU-MLE algorithms and demonstrate their use on real and simulated data, including leukemia gene-expression p-values, illustrating a mixture of a Uniform component with mass near the origin.

Abstract

The Grenander estimator is a well-studied procedure for univariate nonparametric density estimation. It is usually defined as the Maximum Likelihood Estimator (MLE) over the class of all non-increasing densities on the positive real line. It can also be seen as the MLE over the class of all scale mixtures of uniform densities. Using the latter viewpoint, Pavlides and Wellner~\cite{pavlides2012nonparametric} proposed a multivariate extension of the Grenander estimator as the nonparametric MLE over the class of all multivariate scale mixtures of uniform densities. We prove that this multivariate estimator achieves the univariate cube root rate of convergence with only a logarithmic multiplicative factor that depends on the dimension. The usual curse of dimensionality is therefore avoided to some extent for this multivariate estimator. This result positively resolves a conjecture of Pavlides and Wellner~\cite{pavlides2012nonparametric} under an additional lower bound assumption. Our proof proceeds via a general accuracy result for the Hellinger accuracy of MLEs over convex classes of densities. We also provide algorithms for computing the estimator, and illustrate performance on real and simulated datasets.

Convergence rates for estimating multivariate scale mixtures of uniform densities

TL;DR

This work extends the Grenander density-estimation paradigm to multivariate scale mixtures of uniforms, proving that the multivariate SMU-MLE attains a Hellinger risk of order up to a dimension-dependent log factor , under compact support, an upper bound, and a lower-bound relaxing assumption on . A general Hellinger-accuracy theorem for MLEs over convex density classes (Theorem hellexp) reduces estimation risk to empirical-process bounds and entropy, a core methodological advance. Bracketing-entropy bounds for SMU subclasses, derived via distribution-function entropies of nonnegative measures, enable the main rates, a minimax lower bound , and adaptive results when is piecewise constant on rectangles. Computationally, the authors provide exact and approximate SMU-MLE algorithms and demonstrate their use on real and simulated data, including leukemia gene-expression p-values, illustrating a mixture of a Uniform component with mass near the origin.

Abstract

The Grenander estimator is a well-studied procedure for univariate nonparametric density estimation. It is usually defined as the Maximum Likelihood Estimator (MLE) over the class of all non-increasing densities on the positive real line. It can also be seen as the MLE over the class of all scale mixtures of uniform densities. Using the latter viewpoint, Pavlides and Wellner~\cite{pavlides2012nonparametric} proposed a multivariate extension of the Grenander estimator as the nonparametric MLE over the class of all multivariate scale mixtures of uniform densities. We prove that this multivariate estimator achieves the univariate cube root rate of convergence with only a logarithmic multiplicative factor that depends on the dimension. The usual curse of dimensionality is therefore avoided to some extent for this multivariate estimator. This result positively resolves a conjecture of Pavlides and Wellner~\cite{pavlides2012nonparametric} under an additional lower bound assumption. Our proof proceeds via a general accuracy result for the Hellinger accuracy of MLEs over convex classes of densities. We also provide algorithms for computing the estimator, and illustrate performance on real and simulated datasets.

Paper Structure

This paper contains 23 sections, 27 theorems, 308 equations, 4 figures, 2 algorithms.

Key Result

Theorem 2.1

Consider the setting described above. For $t \geq 0$, let where $P_0$ is the probability measure corresponding to the true density $p_0$ and $P_n$ is the empirical distribution of $X_1, \dots, X_n$. All expectations below are with respect to $P_0$. Suppose there exist two real numbers $t_0 > 0$ and $0 < \eta \leq 1$, and a function $\bar{G}: [0, \infty) \r Then and

Figures (4)

  • Figure 1: True density (left panel) $p_0$ and Estimated density computed using Algorithm \ref{['alg:smu_mle']} from $n = 400$ points drawn from $p_0$. Here $p_0$ is given by \ref{['pwsmudef']} with $G$ taken to be the discrete uniform distribution on $\{\vartheta_1,\dots, \vartheta_n\}$ where $\vartheta_j = (5 + 5 \cos(\pi j/(2n)), 5 + 5 \sin (\pi j /(2n)))$
  • Figure 2: True density (left panel) $p_0$ (same as in Figure \ref{['fig:plot:data_estimate_fullmethod_n400']}) and estimated density computed using Algorithm \ref{['alg:smu_mle_approx']} from $n = 2000$ points drawn from $p_0$
  • Figure 3: 7129 $p$-values obtained from the micro-array dataset of golub1999molecular. Each point in this plot corresponds to one gene, and represents a pair of $p$-values.
  • Figure 4: An approximate SMU MLE fitted to the data in Figure \ref{['fig:pvalues']} using Algorithm \ref{['alg:smu_mle_approx']}.

Theorems & Definitions (51)

  • Theorem 2.1
  • Theorem 2.2: Ossiander87bracket and Theorem 19.36 of vaart98book
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Theorem 3.4: Theorem 1.1 of gao2013book
  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.3
  • Corollary 4.4
  • ...and 41 more