Sample complexity of optimal transport barycenters with discrete support
Léo Portales, Edouard Pauwels, Elsa Cazelles
TL;DR
The paper addresses statistical guarantees for empirical sparse optimal transport (OT) barycenters, deriving uniform generalization bounds of order $O\big(\sqrt{N/n}\big)$ where $N$ is the barycenter's maximum support and $n$ is the per-target-sample size. The authors develop a framework that covers multiple OT divergences, including $W_p^p$, $W_{\epsilon,p}^p$, $SW_p^p$, and max-$SW_p^p$, by leveraging semi-dual representations and empirical process theory to control the dual variables. They show that the $O(\sqrt{N/n})$ rate is uniform across the number of measures $L$ and the regularization parameter $\epsilon$, with dimension-free constants, and discuss tightness via lower bounds and the behavior in $N$, including implications for K-means and constrained K-means. The results provide practical guidance for the sample complexity of sparse OT barycenters, informing algorithm design and theoretical understanding in high-dimensional settings and enabling reliable performance when working with discrete or discretized target measures.
Abstract
Computational implementation of optimal transport barycenters for a set of target probability measures requires a form of approximation, a widespread solution being empirical approximation of measures. We provide an $O(\sqrt{N/n})$ statistical generalization bounds for the empirical sparse optimal transport barycenters problem, where $N$ is the maximum cardinality of the barycenter (sparse support) and $n$ is the sample size of the target measures empirical approximation. Our analysis includes various optimal transport divergences including Wasserstein, Sinkhorn and Sliced-Wasserstein. We discuss the application of our result to specific settings including K-means, constrained K-means, free and fixed support Wasserstein barycenters.
