Universal generalization guarantees for Wasserstein distributionally robust models
Tam Le, Jérôme Malick
TL;DR
This work addresses the problem of providing exact generalization guarantees for Wasserstein distributionally robust models (WDRO) across broad data domains and loss functions, including nonsmooth deep learning objectives. It develops a novel proof framework based on duality and nonsmooth variational analysis to obtain an exact generalization bound with $\rho$ scaling as $O(1/\sqrt{n})$, avoiding the curse of dimensionality that plagues concentration-based arguments. The authors also derive an excess risk bound, extend the results to entropically regularized WDRO, and provide specialized results for linear and logistic regression, including explicit constants. Overall, the paper delivers dimension-free, exact WDRO generalization guarantees for a wide class of costs and losses, with practical implications for training robust models under distributional shifts and data uncertainty.
Abstract
Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality. However, these results are either approximate, obtained in specific cases, or based on assumptions difficult to verify in practice. In contrast, we establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions, including deep learning objectives with nonsmooth activations. We complete our analysis with an excess bound on the robust objective and an extension to Wasserstein robust models with entropic regularizations.
