Table of Contents
Fetching ...

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Jeremiah Birrell, Xiaoxi Shen

Abstract

We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for supervised learning via DRO-based adversarial training, as commonly employed to enhance the adversarial robustness of machine learning models. Our results apply to a wide range of OT cost functions, beyond the $p$-Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized $f$-divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the $p$-Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Abstract

We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for supervised learning via DRO-based adversarial training, as commonly employed to enhance the adversarial robustness of machine learning models. Our results apply to a wide range of OT cost functions, beyond the -Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized -divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the -Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.

Paper Structure

This paper contains 19 sections, 11 theorems, 116 equations.

Key Result

Proposition 2.1

Let $c$ be a cost function that satisfies $c(z,z)=0$ for all $z\in\mathcal{Z}$, $\mathcal{L}:\mathcal{Z}\to\mathbb{R}$ be measurable and bounded below, and $P\in\mathcal{P}(\mathcal{Z})$. Then for all $r>0$ we have where and we employ the convention $\infty-\infty\coloneqq-\infty$.

Theorems & Definitions (28)

  • Proposition 2.1
  • Remark 2.2
  • Proposition 2.3
  • Remark 3.2
  • Lemma 3.3
  • Remark 3.4
  • proof
  • Lemma 3.6
  • proof
  • Theorem 3.8
  • ...and 18 more