Minimax Statistical Learning with Wasserstein Distances
Jaeho Lee, Maxim Raginsky
TL;DR
The paper introduces a local minimax learning framework built on Wasserstein-ball ambiguity sets to address distribution drift and domain shift. It develops generalization bounds for minimax empirical risk minimization, including data-dependent, Lipschitz, and minimal-assumption regimes, and provides concrete example bounds for regression and RKHS classes. A key contribution is a transport-based domain adaptation bound that relies on estimating the Wasserstein distance between source and target distributions from unlabeled data, enabling data-driven robustness to domain changes. The framework connects optimal transport, Kantorovich duality, and empirical process theory to yield practical guidance for robust learning under drift and for domain adaptation, with explicit procedures to select the ambiguity radius from data.
Abstract
As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove generalization bounds that involve the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for transport-based domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.
