Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization
Valentio Iverson, Stephen Vavasis
TL;DR
We address the problem of estimating the translation $\boldsymbol{\mu}$ and scale $\sigma$ for distributions of the form $\frac{1}{\sigma^l} f_0\left(\frac{\boldsymbol{x}-\boldsymbol{\mu}}{\sigma}\right)$ using Wasserstein distance, contrasting the intractability of MLE (NP-hard in general) with tractable Wasserstein minimization on a structured class of piecewise-constant densities. The authors formulate the estimation as a semi-discrete optimal transport problem, derive explicit first-order conditions yielding closed-form expressions for $\boldsymbol{\mu}^*$ and $\sigma^*$ in terms of moments of the source measure and the transport plan, and then present a randomized polynomial-time algorithm that approximates these parameters when the source is piecewise-constant on hyperrectangles. Central to the algorithm is maximizing a concave energy $\mathcal{E}(\boldsymbol{g})$ over Laguerre cells; volumes are estimated via KLS and separation oracles, and the smoothness constant $L$ is shown to be poly$(n,l,k,1/s)$, with inexact gradient descent achieving the desired accuracy. A key hardness result shows MLE remains NP-hard for the same class of distributions, highlighting the practical advantage of the Wasserstein approach and motivating future EM-like Wasserstein methods, while also noting open questions about extending to broader function classes and mixture models.
Abstract
Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation $\boldsymbolμ \in \mathbb{R}^l$ and shrinkage $σ\in \mathbb{R}_{++}$ parameters for a distribution of the form $\frac{1}{σ^l} f_0 \left( \frac{\boldsymbol{x} - \boldsymbolμ}σ \right)$, where $f_0$ is a known density in $\mathbb{R}^l$ given $n$ samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $\varepsilon$-approximations for arbitrary $\varepsilon > 0$ within $\text{poly} \left( \frac{1}{\varepsilon} \right)$ time using the Wasserstein distance.
