Table of Contents
Fetching ...

Wasserstein distributionally robust optimization and its tractable regularization formulations

Hong T. M. Chu, Meixia Lin, Kim-Chuan Toh

TL;DR

A flexible framework is developed to derive lower and upper bounds for the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart.

Abstract

We study a variety of Wasserstein distributionally robust optimization (WDRO) problems where the distributions in the ambiguity set are chosen by constraining their Wasserstein discrepancies to the empirical distribution. Using the notion of weak Lipschitz property, we derive lower and upper bounds of the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart. Our constructive methodology and elementary analysis also directly characterize the closed-form of the approximate worst-case distribution. Extensive applications show that our theoretical results are applicable to various problems, including regression, classification and risk measure problems.

Wasserstein distributionally robust optimization and its tractable regularization formulations

TL;DR

A flexible framework is developed to derive lower and upper bounds for the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart.

Abstract

We study a variety of Wasserstein distributionally robust optimization (WDRO) problems where the distributions in the ambiguity set are chosen by constraining their Wasserstein discrepancies to the empirical distribution. Using the notion of weak Lipschitz property, we derive lower and upper bounds of the corresponding worst-case loss quantity and propose sufficient conditions under which this quantity coincides with its regularization scheme counterpart. Our constructive methodology and elementary analysis also directly characterize the closed-form of the approximate worst-case distribution. Extensive applications show that our theoretical results are applicable to various problems, including regression, classification and risk measure problems.
Paper Structure (33 sections, 16 theorems, 169 equations, 1 figure, 1 table)

This paper contains 33 sections, 16 theorems, 169 equations, 1 figure, 1 table.

Key Result

Lemma 3.1

Given any distribution $\mathbb{P}\in\mathcal{P}(\mathcal{Z})$ and any point $\hat{z}\in \mathcal{Z}$, for any scalar $r\geq 1$ and any extended nonnegative-valued measurable function $d\colon\mathcal{Z}\times\mathcal{Z} \rightarrow [0,\infty]$, we have

Figures (1)

  • Figure 1: An illustration of Assumptions (A1-A2) (best viewed in color) when $\mathcal{Z}=\mathbb{R}, \mathcal{Z}_{N}=\{Z^{(1)},Z^{(2)} \}$ and $d(z',z) = \left|z'-z\right|$.

Theorems & Definitions (33)

  • Lemma 3.1
  • Definition 3.1
  • Definition 3.2: Weak Lipschitz property
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.1
  • Example 3.1: Binary cross-entropy yi2004automatedscott2012calibratedhurtik2022binary
  • Example 3.2: Hard sigmoid howard2019searching / HardTanh collobert2004large
  • Theorem 3.3
  • Definition 4.1
  • ...and 23 more