Table of Contents
Fetching ...

Universal generalization guarantees for Wasserstein distributionally robust models

Tam Le, Jérôme Malick

TL;DR

This work addresses the problem of providing exact generalization guarantees for Wasserstein distributionally robust models (WDRO) across broad data domains and loss functions, including nonsmooth deep learning objectives. It develops a novel proof framework based on duality and nonsmooth variational analysis to obtain an exact generalization bound with $\rho$ scaling as $O(1/\sqrt{n})$, avoiding the curse of dimensionality that plagues concentration-based arguments. The authors also derive an excess risk bound, extend the results to entropically regularized WDRO, and provide specialized results for linear and logistic regression, including explicit constants. Overall, the paper delivers dimension-free, exact WDRO generalization guarantees for a wide class of costs and losses, with practical implications for training robust models under distributional shifts and data uncertainty.

Abstract

Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality. However, these results are either approximate, obtained in specific cases, or based on assumptions difficult to verify in practice. In contrast, we establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions, including deep learning objectives with nonsmooth activations. We complete our analysis with an excess bound on the robust objective and an extension to Wasserstein robust models with entropic regularizations.

Universal generalization guarantees for Wasserstein distributionally robust models

TL;DR

This work addresses the problem of providing exact generalization guarantees for Wasserstein distributionally robust models (WDRO) across broad data domains and loss functions, including nonsmooth deep learning objectives. It develops a novel proof framework based on duality and nonsmooth variational analysis to obtain an exact generalization bound with scaling as , avoiding the curse of dimensionality that plagues concentration-based arguments. The authors also derive an excess risk bound, extend the results to entropically regularized WDRO, and provide specialized results for linear and logistic regression, including explicit constants. Overall, the paper delivers dimension-free, exact WDRO generalization guarantees for a wide class of costs and losses, with practical implications for training robust models under distributional shifts and data uncertainty.

Abstract

Distributionally robust optimization has emerged as an attractive way to train robust machine learning models, capturing data uncertainty and distribution shifts. Recent statistical analyses have proved that generalization guarantees of robust models based on the Wasserstein distance have generalization guarantees that do not suffer from the curse of dimensionality. However, these results are either approximate, obtained in specific cases, or based on assumptions difficult to verify in practice. In contrast, we establish exact generalization guarantees that cover a wide range of cases, with arbitrary transport costs and parametric loss functions, including deep learning objectives with nonsmooth activations. We complete our analysis with an excess bound on the robust objective and an extension to Wasserstein robust models with entropic regularizations.
Paper Structure (46 sections, 35 theorems, 149 equations, 2 figures)

This paper contains 46 sections, 35 theorems, 149 equations, 2 figures.

Key Result

Theorem 3.1

If ass:general_assumptions holds and $\rho_{\operatorname{crit}}>0$, then there exists $\lambda_{\operatorname{low}}>0$ such that when $n > \frac{16(\alpha + \beta)^2}{\rho_{\operatorname{crit}}^2}$ and $\rho > \frac{\alpha}{\sqrt{n}}$, we have with probability at least $1 - \delta$, where $\alpha$ and $\beta$ are the two constants

Figures (2)

  • Figure 1: A central object of our analysis: the maximal radius $\rho_{\max}$, defined from the lower envelope of derivatives of $\phi$.
  • Figure 2: Bounding from below the empirical dual solution $\lambda^*$ expresses as a slope condition (thanks to convexity of the objective).

Theorems & Definitions (62)

  • Theorem 3.1: Generalization guarantee for Wasserstein robust models
  • Proposition 3.1: Excess risk for Wasserstein robust models
  • Remark 3.1: The critical radius as a degeneracy threshold
  • Proposition 3.2: Linear models dual bound and critical radius
  • Theorem 3.2: Generalization for double regularization
  • Proposition 3.3: Excess risk for doubly regularized robust models
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • Definition A.1: Lower and upper semicontinuity infinite
  • ...and 52 more