Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Jeremiah Birrell; Xiaoxi Shen

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Jeremiah Birrell, Xiaoxi Shen

Abstract

We study finite-sample statistical performance guarantees for distributionally robust optimization (DRO) with optimal transport (OT) and OT-regularized divergence model neighborhoods. Specifically, we derive concentration inequalities for supervised learning via DRO-based adversarial training, as commonly employed to enhance the adversarial robustness of machine learning models. Our results apply to a wide range of OT cost functions, beyond the $p$-Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized $f$-divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the $p$-Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Abstract

-Wasserstein case studied by previous authors. In particular, our results are the first to: 1) cover soft-constraint norm-ball OT cost functions; soft-constraint costs have been shown empirically to enhance robustness when used in adversarial training, 2) apply to the combination of adversarial sample generation and adversarial reweighting that is induced by using OT-regularized

-divergence model neighborhoods; the added reweighting mechanism has also been shown empirically to further improve performance. In addition, even in the

-Wasserstein case, our bounds exhibit better behavior as a function of the DRO neighborhood size than previous results when applied to the adversarial setting.

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Abstract

Statistical Guarantees for Distributionally Robust Optimization with Optimal Transport and OT-Regularized Divergences

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (28)