Nonparametric MLE for Gaussian Location Mixtures: Certified Computation and Generic Behavior
Yury Polyanskiy, Mark Sellke
TL;DR
This paper analyzes the nonparametric maximum likelihood estimator (NPMLE) for Gaussian location mixtures in one dimension, showing that while the NPMLE can have up to $n$ atoms, generic random data yield a well-behaved, certifiably computable structure. A key contribution is proving almost-sure strictness of Lindsay’s stationarity conditions, absolute continuity of the NPMLE’s law on the $k$-atom manifold, and near-optimal local landscape properties that imply linear EM convergence and a locally quadratic Newton convergence when near the true solution. The authors introduce a certified computation framework: an $\\varepsilon$-grid approximation plus atom-merging yields a certifiable Shub–Smale approximate NPMLE with provable Wasserstein error $O_X(\\varepsilon^{1/4})$ and a method to exactly certify the atom count, plus a finite-time algorithm to obtain these guarantees. They also extend the discussion to static-support NPMLE with analogous certifiable guarantees and show that in higher dimensions ($d\ge 2$) the NPMLE can exhibit unbounded support size, highlighting qualitative differences beyond the one-dimensional setting.
Abstract
We study the nonparametric maximum likelihood estimator $\widehatπ$ for Gaussian location mixtures in one dimension. It has been known since (Lindsay, 1983) that given an $n$-point dataset, this estimator always returns a mixture with at most $n$ components, and more recently (Wu-Polyanskiy, 2020) gave a sharp $O(\log n)$ bound for subgaussian data. In this work we study computational aspects of $\widehatπ$. We provide an algorithm which for small enough $\varepsilon>0$ computes an $\varepsilon$-approximation of $\widehatπ$ in Wasserstein distance in time $K+Cnk^2\log\log(1/\varepsilon)$. Here $K$ is data-dependent but independent of $\varepsilon$, while $C$ is an absolute constant and $k=|supp(\widehatπ)|\leq n$ is the number of atoms in $\widehatπ$. We also certifiably compute the exact value of $|supp(\widehatπ)|$ in finite time. These guarantees hold almost surely whenever the dataset $(x_1,\dots,x_n)\in [-cn^{1/4},cn^{1/4}]$ consists of independent points from a probability distribution with a density (relative to Lebesgue measure). We also show the distribution of $\widehatπ$ conditioned to be $k$-atomic admits a density on the associated $2k-1$ dimensional parameter space for all $k\leq \sqrt{n}/3$, and almost sure locally linear convergence of the EM algorithm. One key tool is a classical Fourier analytic estimate for non-degenerate curves.
