Wasserstein distributionally robust optimization and its tractable regularization formulations

Hong T. M. Chu; Meixia Lin; Kim-Chuan Toh

Wasserstein distributionally robust optimization and its tractable regularization formulations

Hong T. M. Chu, Meixia Lin, Kim-Chuan Toh

TL;DR

A flexible framework is developed to derive lower and upper bounds for the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart.

Abstract

We study a variety of Wasserstein distributionally robust optimization (WDRO) problems where the distributions in the ambiguity set are chosen by constraining their Wasserstein discrepancies to the empirical distribution. Using the notion of weak Lipschitz property, we derive lower and upper bounds of the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart. Our constructive methodology and elementary analysis also directly characterize the closed-form of the approximate worst-case distribution. Extensive applications show that our theoretical results are applicable to various problems, including regression, classification and risk measure problems.

Wasserstein distributionally robust optimization and its tractable regularization formulations

TL;DR

Abstract

Paper Structure (33 sections, 16 theorems, 169 equations, 1 figure, 1 table)

This paper contains 33 sections, 16 theorems, 169 equations, 1 figure, 1 table.

Introduction
Main contributions
Theoretical analysis of the equivalence
Lower and Upper bounds of the worst-case loss quantity
Equivalence in \ref{['eq:main_equation']} when $r=1$
Equivalence in \ref{['eq:main_equation']} when $r>1$
Case 1.
Case 2.
Applications to different function classes
Applications to simple piecewise linear regression loss functions
Applications to nonlinear regression loss functions
A special regression model
Applications to classification loss functions
Generalization to risk measure
Conclusion
...and 18 more sections

Key Result

Lemma 3.1

Given any distribution $\mathbb{P}\in\mathcal{P}(\mathcal{Z})$ and any point $\hat{z}\in \mathcal{Z}$, for any scalar $r\geq 1$ and any extended nonnegative-valued measurable function $d\colon\mathcal{Z}\times\mathcal{Z} \rightarrow [0,\infty]$, we have

Figures (1)

Figure 1: An illustration of Assumptions (A1-A2) (best viewed in color) when $\mathcal{Z}=\mathbb{R}, \mathcal{Z}_{N}=\{Z^{(1)},Z^{(2)} \}$ and $d(z',z) = \left|z'-z\right|$.

Theorems & Definitions (33)

Lemma 3.1
Definition 3.1
Definition 3.2: Weak Lipschitz property
Theorem 3.1
Theorem 3.2
Remark 3.1
Example 3.1: Binary cross-entropy yi2004automatedscott2012calibratedhurtik2022binary
Example 3.2: Hard sigmoid howard2019searching / HardTanh collobert2004large
Theorem 3.3
Definition 4.1
...and 23 more

Wasserstein distributionally robust optimization and its tractable regularization formulations

TL;DR

Abstract

Wasserstein distributionally robust optimization and its tractable regularization formulations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (33)