Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

Valentio Iverson; Stephen Vavasis

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

Valentio Iverson, Stephen Vavasis

TL;DR

We address the problem of estimating the translation $\boldsymbol{\mu}$ and scale $\sigma$ for distributions of the form $\frac{1}{\sigma^l} f_0\left(\frac{\boldsymbol{x}-\boldsymbol{\mu}}{\sigma}\right)$ using Wasserstein distance, contrasting the intractability of MLE (NP-hard in general) with tractable Wasserstein minimization on a structured class of piecewise-constant densities. The authors formulate the estimation as a semi-discrete optimal transport problem, derive explicit first-order conditions yielding closed-form expressions for $\boldsymbol{\mu}^*$ and $\sigma^*$ in terms of moments of the source measure and the transport plan, and then present a randomized polynomial-time algorithm that approximates these parameters when the source is piecewise-constant on hyperrectangles. Central to the algorithm is maximizing a concave energy $\mathcal{E}(\boldsymbol{g})$ over Laguerre cells; volumes are estimated via KLS and separation oracles, and the smoothness constant $L$ is shown to be poly$(n,l,k,1/s)$, with inexact gradient descent achieving the desired accuracy. A key hardness result shows MLE remains NP-hard for the same class of distributions, highlighting the practical advantage of the Wasserstein approach and motivating future EM-like Wasserstein methods, while also noting open questions about extending to broader function classes and mixture models.

Abstract

Parameter estimation is a fundamental challenge in machine learning, crucial for tasks such as neural network weight fitting and Bayesian inference. This paper focuses on the complexity of estimating translation $\boldsymbolμ \in \mathbb{R}^l$ and shrinkage $σ\in \mathbb{R}_{++}$ parameters for a distribution of the form $\frac{1}{σ^l} f_0 \left( \frac{\boldsymbol{x} - \boldsymbolμ}σ \right)$, where $f_0$ is a known density in $\mathbb{R}^l$ given $n$ samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain $\varepsilon$-approximations for arbitrary $\varepsilon > 0$ within $\text{poly} \left( \frac{1}{\varepsilon} \right)$ time using the Wasserstein distance.

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

TL;DR

We address the problem of estimating the translation

and scale

for distributions of the form

using Wasserstein distance, contrasting the intractability of MLE (NP-hard in general) with tractable Wasserstein minimization on a structured class of piecewise-constant densities. The authors formulate the estimation as a semi-discrete optimal transport problem, derive explicit first-order conditions yielding closed-form expressions for

and

in terms of moments of the source measure and the transport plan, and then present a randomized polynomial-time algorithm that approximates these parameters when the source is piecewise-constant on hyperrectangles. Central to the algorithm is maximizing a concave energy

over Laguerre cells; volumes are estimated via KLS and separation oracles, and the smoothness constant

is shown to be poly

, with inexact gradient descent achieving the desired accuracy. A key hardness result shows MLE remains NP-hard for the same class of distributions, highlighting the practical advantage of the Wasserstein approach and motivating future EM-like Wasserstein methods, while also noting open questions about extending to broader function classes and mixture models.

Abstract

and shrinkage

parameters for a distribution of the form

, where

is a known density in

given

samples. We highlight that while the problem is NP-hard for Maximum Likelihood Estimation (MLE), it is possible to obtain

-approximations for arbitrary

within

time using the Wasserstein distance.

Paper Structure (8 sections, 17 theorems, 131 equations, 1 algorithm)

This paper contains 8 sections, 17 theorems, 131 equations, 1 algorithm.

Introduction
Parameter Estimation in Wasserstein Minimization
Setup and Explicit Formulas for Parameter Estimation
Polynomial-Time Algorithm for Parameter Estimation
Necessity of Parameters in Smoothness Constants
Hardness of Finding a Single Optimal Center in MLE
Future Work
Acknowledgements

Key Result

Theorem 2.1

Assume $c(\bm{x}, \bm{y}_j)$ has the form $\| \bm{x} - \bm{y}_j - \bm{\mu} \|^2$. Then the primal optimal solution $\pi$ is independent of $\bm{\mu}$.

Theorems & Definitions (41)

Theorem 2.1
proof
Remark 2.2
Theorem 2.3
proof
Corollary 2.4
Theorem 2.5
proof
Claim 2.6
Theorem 2.7
...and 31 more

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

TL;DR

Abstract

Mean and Variance Estimation Complexity in Arbitrary Distributions via Wasserstein Minimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (41)