Universal generalization guarantees for Wasserstein distributionally robust models

Tam Le; Jérôme Malick

Universal generalization guarantees for Wasserstein distributionally robust models

Tam Le, Jérôme Malick

TL;DR

This work addresses the problem of providing exact generalization guarantees for Wasserstein distributionally robust models (WDRO) across broad data domains and loss functions, including nonsmooth deep learning objectives. It develops a novel proof framework based on duality and nonsmooth variational analysis to obtain an exact generalization bound with $\rho$ scaling as $O(1/\sqrt{n})$, avoiding the curse of dimensionality that plagues concentration-based arguments. The authors also derive an excess risk bound, extend the results to entropically regularized WDRO, and provide specialized results for linear and logistic regression, including explicit constants. Overall, the paper delivers dimension-free, exact WDRO generalization guarantees for a wide class of costs and losses, with practical implications for training robust models under distributional shifts and data uncertainty.

Abstract

Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality. However, these results are either approximate, obtained in specific cases, or based on assumptions difficult to verify in practice. In contrast, we establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions, including deep learning objectives with nonsmooth activations. We complete our analysis with an excess bound on the robust objective and an extension to Wasserstein robust models with entropic regularizations.

Universal generalization guarantees for Wasserstein distributionally robust models

TL;DR

scaling as

, avoiding the curse of dimensionality that plagues concentration-based arguments. The authors also derive an excess risk bound, extend the results to entropically regularized WDRO, and provide specialized results for linear and logistic regression, including explicit constants. Overall, the paper delivers dimension-free, exact WDRO generalization guarantees for a wide class of costs and losses, with practical implications for training robust models under distributional shifts and data uncertainty.

Abstract

Paper Structure (46 sections, 35 theorems, 149 equations, 2 figures)

This paper contains 46 sections, 35 theorems, 149 equations, 2 figures.

Introduction
Wasserstein robustness: models and generalization
Contributions and outline
Related work
Notations
On probability spaces.
On function spaces.
Assumptions and examples
Parametric models and loss functions.
Sample space and transport costs.
Main results
Wasserstein robust models
Generalization guarantees of Wasserstein robust linear models
Regularized Wasserstein robust models
Limitations and potential extensions
...and 31 more sections

Key Result

Theorem 3.1

If ass:general_assumptions holds and $\rho_{\operatorname{crit}}>0$, then there exists $\lambda_{\operatorname{low}}>0$ such that when $n > \frac{16(\alpha + \beta)^2}{\rho_{\operatorname{crit}}^2}$ and $\rho > \frac{\alpha}{\sqrt{n}}$, we have with probability at least $1 - \delta$, where $\alpha$ and $\beta$ are the two constants

Figures (2)

Figure 1: A central object of our analysis: the maximal radius $\rho_{\max}$, defined from the lower envelope of derivatives of $\phi$.
Figure 2: Bounding from below the empirical dual solution $\lambda^*$ expresses as a slope condition (thanks to convexity of the objective).

Theorems & Definitions (62)

Theorem 3.1: Generalization guarantee for Wasserstein robust models
Proposition 3.1: Excess risk for Wasserstein robust models
Remark 3.1: The critical radius as a degeneracy threshold
Proposition 3.2: Linear models dual bound and critical radius
Theorem 3.2: Generalization for double regularization
Proposition 3.3: Excess risk for doubly regularized robust models
Lemma 4.1
Lemma 4.2
Lemma 4.3
Definition A.1: Lower and upper semicontinuity infinite
...and 52 more

Universal generalization guarantees for Wasserstein distributionally robust models

TL;DR

Abstract

Universal generalization guarantees for Wasserstein distributionally robust models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (62)