Table of Contents
Fetching ...

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

Valentio Iverson, Stephen Vavasis

TL;DR

We address the problem of estimating the translation $\boldsymbol{\mu}$ and scale $\sigma$ for distributions of the form $\frac{1}{\sigma^l} f_0\left(\frac{\boldsymbol{x}-\boldsymbol{\mu}}{\sigma}\right)$ using Wasserstein distance, contrasting the intractability of MLE (NP-hard in general) with tractable Wasserstein minimization on a structured class of piecewise-constant densities. The authors formulate the estimation as a semi-discrete optimal transport problem, derive explicit first-order conditions yielding closed-form expressions for $\boldsymbol{\mu}^*$ and $\sigma^*$ in terms of moments of the source measure and the transport plan, and then present a randomized polynomial-time algorithm that approximates these parameters when the source is piecewise-constant on hyperrectangles. Central to the algorithm is maximizing a concave energy $\mathcal{E}(\boldsymbol{g})$ over Laguerre cells; volumes are estimated via KLS and separation oracles, and the smoothness constant $L$ is shown to be poly$(n,l,k,1/s)$, with inexact gradient descent achieving the desired accuracy. A key hardness result shows MLE remains NP-hard for the same class of distributions, highlighting the practical advantage of the Wasserstein approach and motivating future EM-like Wasserstein methods, while also noting open questions about extending to broader function classes and mixture models.

Abstract

Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation $\boldsymbolμ \in \mathbb{R}^l$ and shrinkage $σ\in \mathbb{R}_{++}$ parameters for a distribution of the form $\frac{1}{σ^l} f_0 \left( \frac{\boldsymbol{x} - \boldsymbolμ}σ \right)$, where $f_0$ is a known density in $\mathbb{R}^l$ given $n$ samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $\varepsilon$-approximations for arbitrary $\varepsilon > 0$ within $\text{poly} \left( \frac{1}{\varepsilon} \right)$ time using the Wasserstein distance.

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

TL;DR

We address the problem of estimating the translation and scale for distributions of the form using Wasserstein distance, contrasting the intractability of MLE (NP-hard in general) with tractable Wasserstein minimization on a structured class of piecewise-constant densities. The authors formulate the estimation as a semi-discrete optimal transport problem, derive explicit first-order conditions yielding closed-form expressions for and in terms of moments of the source measure and the transport plan, and then present a randomized polynomial-time algorithm that approximates these parameters when the source is piecewise-constant on hyperrectangles. Central to the algorithm is maximizing a concave energy over Laguerre cells; volumes are estimated via KLS and separation oracles, and the smoothness constant is shown to be poly, with inexact gradient descent achieving the desired accuracy. A key hardness result shows MLE remains NP-hard for the same class of distributions, highlighting the practical advantage of the Wasserstein approach and motivating future EM-like Wasserstein methods, while also noting open questions about extending to broader function classes and mixture models.

Abstract

Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation and shrinkage parameters for a distribution of the form , where is a known density in given samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain -approximations for arbitrary within time using the Wasserstein distance.
Paper Structure (8 sections, 17 theorems, 131 equations, 1 algorithm)

This paper contains 8 sections, 17 theorems, 131 equations, 1 algorithm.

Key Result

Theorem 2.1

Assume $c(\bm{x}, \bm{y}_j)$ has the form $\| \bm{x} - \bm{y}_j - \bm{\mu} \|^2$. Then the primal optimal solution $\pi$ is independent of $\bm{\mu}$.

Theorems & Definitions (41)

  • Theorem 2.1
  • proof
  • Remark 2.2
  • Theorem 2.3
  • proof
  • Corollary 2.4
  • Theorem 2.5
  • proof
  • Claim 2.6
  • Theorem 2.7
  • ...and 31 more