Robust Estimation under the Wasserstein Distance

Sloan Nietert; Rachel Cummings; Ziv Goldfeld

Robust Estimation under the Wasserstein Distance

Sloan Nietert, Rachel Cummings, Ziv Goldfeld

TL;DR

This work derives a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT, and proves new structural properties for POT and shows that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings.

Abstract

We study the problem of robust distribution estimation under the Wasserstein distance, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. Given $n$ samples from an unknown distribution $μ$, of which $\varepsilon n$ are adversarially corrupted, we seek an estimate for $μ$ with minimal Wasserstein error. To address this task, we draw upon two frameworks from OT and robust statistics: partial OT (POT) and minimum distance estimation (MDE). We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings. Along the way, we derive a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT. Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets via an elementary modification to WGAN. Numerical experiments demonstrating the efficacy of our approach in mitigating the impact of adversarial corruptions are provided.

Robust Estimation under the Wasserstein Distance

TL;DR

Abstract

We study the problem of robust distribution estimation under the Wasserstein distance, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. Given

samples from an unknown distribution

, of which

are adversarially corrupted, we seek an estimate for

with minimal Wasserstein error. To address this task, we draw upon two frameworks from OT and robust statistics: partial OT (POT) and minimum distance estimation (MDE). We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings. Along the way, we derive a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT. Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets via an elementary modification to WGAN. Numerical experiments demonstrating the efficacy of our approach in mitigating the impact of adversarial corruptions are provided.

Paper Structure (23 sections, 18 theorems, 48 equations, 5 figures, 1 table)

This paper contains 23 sections, 18 theorems, 48 equations, 5 figures, 1 table.

Introduction
Our Contributions
Related Work
Preliminaries
(Partial) Optimal Transport
Minimum Distance Estimation
WGAN as MDE under $\mathsf{W}_1$.
Robust statistics and MDE under $\|\cdot\|_\mathsf{TV}$.
Generalized resilience.
Structure of POT
Robust Estimation under Wp
Error and risk.
The estimator.
Population-Limit Guarantees and Resilience
Finite-Sample Guarantees
...and 8 more sections

Key Result

Proposition 1

Let $\bar{x}\notin\mathcal{X}$ be a dummy point, define the disjoint union $\bar{\mathcal{X}}=\mathcal{X} \sqcup \{\bar{x}\}$, and suppose that the cost $c$ is non-negative. Define the augmented cost function $\bar{c}:\bar{\mathcal{X}}^2 \to [0,\infty]$ by Then, for all $\mu,\nu \in \mathcal{P}(\mathcal{X})$, we have $\mathsf{OT}_c^{\varepsilon}(\mu,\nu) = \mathsf{OT}_{\bar{c}}(\mu + \varepsilon

Figures (5)

Figure 1: Visualization of Proposition \ref{['prop:alternative-primal-probs']}. The gridded light blue and green regions each have mass $\varepsilon$, respectively, and are removed to obtain optimal $\mu_{\hbox{$-$}}$ and $\nu_{\hbox{$-$}}$ for $\mathsf{W}_1^\varepsilon$. No mass need be removed from the dark region designating $\mu \land \nu$.
Figure 2: Optimal potentials: (left) 1D densities plotted with their optimal potential for the $\mathsf{W}_1^\varepsilon$ dual problem; (right) contour plots for optimal dual potentials to $\mathsf{W}_1$ and $\mathsf{W}_1^\varepsilon$ between 2D Gaussian mixtures. Observe how optimal potentials for the robust dual are flat over outlier mass.
Figure 3: Visual depiction of MDE under $\mathsf{W}_p^\varepsilon$ and its analysis. Solid lines represent statistical distance bounds given by the problem formulation, and dotted lines represent bounds deduced by our choice of estimator and the approximate triangle inequality for $\mathsf{W}_p^\varepsilon$. We abbreviate $\delta_n = \mathsf{W}_p(\hat{\mu}_n,\mu)$.
Figure 4: (top): samples generated by robustified (left) and standard (right) WGAN-GP after training on corrupted MNIST dataset. (bottom): samples generated by robustified (left) and standard (right) StyleGAN 2 after training on corrupted CelebA-HQ dataset (left).
Figure 5: Samples generated by various robust WGANs after 100k batches of training.

Theorems & Definitions (35)

Proposition 1: POT as augmented standard OT, caffarelli2010
Proposition 2: Robustness of MDE, donoho88
Lemma 1: Bounded modulus under resilience, zhu2019resilience
Example 1: Robust mean estimation
Proposition 3: Equivalent reformulations
Proposition 4: Basic properties
Theorem 1: Dual form
proof : Proof sketch
Remark 1: TV as a dual norm
Proposition 5: Loss trimming dual
...and 25 more

Robust Estimation under the Wasserstein Distance

TL;DR

Abstract

Robust Estimation under the Wasserstein Distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (35)