Table of Contents
Fetching ...

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models

Yangjing Zhang, Ying Cui, Bodhisattva Sen, Kim-Chuan Toh

TL;DR

This paper proposes, leveraging the sparsity of the solution, an efficient and scalable semismooth Newton based augmented Lagrangian method (ALM), which beats the state-of-the-art methods and proposes new denoising estimands in this context along with their consistent estimates.

Abstract

In this paper we study the computation of the nonparametric maximum likelihood estimator (NPMLE) in multivariate mixture models. Our first approach discretizes this infinite dimensional convex optimization problem by fixing the support points of the NPMLE and optimizing over the mixture proportions. In this context we propose, leveraging the sparsity of the solution, an efficient and scalable semismooth Newton based augmented Lagrangian method (ALM). Our algorithm beats the state-of-the-art methods~\cite{koenker2017rebayes, kim2020fast} and can handle $n \approx 10^6$ data points with $m \approx 10^4$ support points. Our second procedure, which combines the expectation-maximization (EM) algorithm with the ALM approach above, allows for joint optimization of both the support points and the probability weights. For both our algorithms we provide formal results on their (superlinear) convergence properties. The computed NPMLE can be immediately used for denoising the observations in the framework of empirical Bayes. We propose new denoising estimands in this context along with their consistent estimates. Extensive numerical experiments are conducted to illustrate the effectiveness of our methods. In particular, we employ our procedures to analyze two astronomy data sets: (i) Gaia-TGAS Catalog~\cite{anderson2018improving} containing $n \approx 1.4 \times 10^6$ data points in two dimensions, and (ii) the $d=19$ dimensional data set from the APOGEE survey~\cite{majewski2017apache} with $n \approx 2.7 \times 10^4$.

On Efficient and Scalable Computation of the Nonparametric Maximum Likelihood Estimator in Mixture Models

TL;DR

This paper proposes, leveraging the sparsity of the solution, an efficient and scalable semismooth Newton based augmented Lagrangian method (ALM), which beats the state-of-the-art methods and proposes new denoising estimands in this context along with their consistent estimates.

Abstract

In this paper we study the computation of the nonparametric maximum likelihood estimator (NPMLE) in multivariate mixture models. Our first approach discretizes this infinite dimensional convex optimization problem by fixing the support points of the NPMLE and optimizing over the mixture proportions. In this context we propose, leveraging the sparsity of the solution, an efficient and scalable semismooth Newton based augmented Lagrangian method (ALM). Our algorithm beats the state-of-the-art methods~\cite{koenker2017rebayes, kim2020fast} and can handle data points with support points. Our second procedure, which combines the expectation-maximization (EM) algorithm with the ALM approach above, allows for joint optimization of both the support points and the probability weights. For both our algorithms we provide formal results on their (superlinear) convergence properties. The computed NPMLE can be immediately used for denoising the observations in the framework of empirical Bayes. We propose new denoising estimands in this context along with their consistent estimates. Extensive numerical experiments are conducted to illustrate the effectiveness of our methods. In particular, we employ our procedures to analyze two astronomy data sets: (i) Gaia-TGAS Catalog~\cite{anderson2018improving} containing data points in two dimensions, and (ii) the dimensional data set from the APOGEE survey~\cite{majewski2017apache} with .
Paper Structure (25 sections, 6 theorems, 82 equations, 16 figures, 6 tables, 1 algorithm)

This paper contains 25 sections, 6 theorems, 82 equations, 16 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

Let $\{\sigma_k\}_{k\geq 0}$ be a nondecreasing positive sequence converging to $\sigma_\infty \leq \infty$. Let $\{(u^k, v^k, x^k, y^k)\}_{k\geq 1}$ be the sequence generated by the ALM with each subproblem satisfying the stopping criterion (S1). Then the primal sequence $\{(x^k, y^k)\}_{k\geq 1}$

Figures (16)

  • Figure 1: Left: The noisy CMD corresponding to data $\{Y_i\}_{i=1}^n \subset \mathbb{R}^2$ for $n \approx 1.4 \times 10^6$ stars obtained from the Gaia-TGAS Catalog. Middle: The denoised CMD using empirical Bayes estimates $\{\widehat{\theta}_i\}_{i=1}^n$ based on the NPMLE computed via the augmented Lagrangian method (see Section \ref{['sec:ALM0']}). Right: The denoised CMD computed via the partial expectation maximization algorithm (see Section \ref{['sec:adaptive supports']}). Both the denoised CMDs have rather sharp tails in the bottom of the plots (i.e., the main sequence) and the top right (i.e., the tip of the red-giant branch) as well as a definitive cluster in the center-right (i.e., the red clump).
  • Figure 2: The distribution of the singular values $\sigma_i(L)$ of $L$ computed from (left) the APOGEE data (here $n=27,135$, and $m=1,000$) and (right) the synthetic Example 3(a) (here $n=5,000$, $m=1,000$) as $d$ varies. The top $20$ singular values are excluded so that the others are not overshadowed. Observe the slow decay of the singular values when $d >1$.
  • Figure 3: Plots of the raw data (in blue) with $n= 5,000$ in $d=2$, the corresponding empirical Bayes estimates (in red), the true $G^*$ (in black), and $\widehat{G}_n$ (in black dots) obtained from our ALM. Here half of the true signals $\theta_i \in \mathbb{R}^d$ are drawn uniformly at random from each of the two concentric circles of radii 2 and 6 respectively (centered at $(0,0) \in \mathbb{R}^2$), and $Y_i \mid \theta_i \sim {\cal N}(\theta_i, I_2)$ for $i=1,\dots,n$. Observe that some of the empirical Bayes estimates $\widehat{\theta}_i$'s are far from the support of $G^*$ (and $\widehat{G}_n$).
  • Figure 4: Computational time (in seconds) of the ALM, mix-SQP, and REBayes with changing $n$ and $m$ for Example 2 (averaged over $10$ replications).
  • Figure 5: Results for the $d=2$ dimensional Gaia-TGAS data obtained from: (a) mix-SQP (with $m = 20^2$), (b) ALM (with $m = 100^2$). The number of support points of $\widehat{G}_n$ (see the (2,2) subplots) are: (a) $307$, (b) $1,677$, where the size of each support point plotted is proportional to its weight. The run times are: (a) 28 minutes (failed to converge within 100 iterations), (b) 472 minutes. With denser grid points we are able to obtain sharper denoised estimates that reveal finer details of the CMD.
  • ...and 11 more figures

Theorems & Definitions (11)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Theorem 1
  • proof
  • proof
  • Definition 1: Push-forward
  • Definition 2: Monge's problem
  • Definition 3: Kantorovich's problem
  • ...and 1 more