Efficient Distribution Learning with Error Bounds in Wasserstein Distance
Eduardo Figueiredo, Steven Adams, Luca Laurenti
TL;DR
This work addresses learning an unknown distribution $\mathbb{P}$ from finite samples by constructing a discrete surrogate $\widehat{\mathbb{P}}$ and providing non-asymptotic, data-driven bounds on the Wasserstein distance ${\mathbb{W}}_{\rho}(\mathbb{P}, {\widehat{\mathbb{P}}})$. It introduces a framework that combines optimal transport, quantization, and concentration inequalities to bound the distance via a tractable mixed-integer linear program whose complexity scales with the discrete support size $M$. A data-driven partition construction paired with a Lloyd-type clustering yields compact discrete approximations with high-confidence error guarantees, adapting to observed data. Empirically, the method yields substantially tighter Wasserstein bounds and smaller supports than state-of-the-art approaches across synthetic and real datasets (e.g., MiniBooNE and OCTMNIST), enabling efficient uncertainty propagation and distributionally robust optimization in practice.
Abstract
The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory, decision theory, and biological systems. Consequently, learning an unknown distribution with non-asymptotic and easy-to-compute error bounds in Wasserstein distance has become a fundamental problem in many fields. In this paper, we devise a novel algorithmic and theoretical framework to approximate an unknown probability distribution $\mathbb{P}$ from a finite set of samples by an approximate discrete distribution $\widehat{\mathbb{P}}$ while bounding the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$. Our framework leverages optimal transport, nonlinear optimization, and concentration inequalities. In particular, we show that, even if $\mathbb{P}$ is unknown, the Wasserstein distance between $\mathbb{P}$ and $\widehat{\mathbb{P}}$ can be efficiently bounded with high confidence by solving a tractable optimization problem (a mixed integer linear program) of a size that only depends on the size of the support of $\widehat{\mathbb{P}}$. This enables us to develop intelligent clustering algorithms to optimally find the support of $\widehat{\mathbb{P}}$ while minimizing the Wasserstein distance error. On a set of benchmarks, we demonstrate that our approach outperforms state-of-the-art comparable methods by generally returning approximating distributions with substantially smaller support and tighter error bounds.
