Table of Contents
Fetching ...

Robust Estimation under the Wasserstein Distance

Sloan Nietert, Rachel Cummings, Ziv Goldfeld

TL;DR

This work derives a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT, and proves new structural properties for POT and shows that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings.

Abstract

We study the problem of robust distribution estimation under the Wasserstein distance, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. Given $n$ samples from an unknown distribution $μ$, of which $\varepsilon n$ are adversarially corrupted, we seek an estimate for $μ$ with minimal Wasserstein error. To address this task, we draw upon two frameworks from OT and robust statistics: partial OT (POT) and minimum distance estimation (MDE). We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings. Along the way, we derive a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT. Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets via an elementary modification to WGAN. Numerical experiments demonstrating the efficacy of our approach in mitigating the impact of adversarial corruptions are provided.

Robust Estimation under the Wasserstein Distance

TL;DR

This work derives a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT, and proves new structural properties for POT and shows that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings.

Abstract

We study the problem of robust distribution estimation under the Wasserstein distance, a popular discrepancy measure between probability distributions rooted in optimal transport (OT) theory. Given samples from an unknown distribution , of which are adversarially corrupted, we seek an estimate for with minimal Wasserstein error. To address this task, we draw upon two frameworks from OT and robust statistics: partial OT (POT) and minimum distance estimation (MDE). We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk in many settings. Along the way, we derive a novel dual form for POT that adds a sup-norm penalty to the classic Kantorovich dual for standard OT. Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets via an elementary modification to WGAN. Numerical experiments demonstrating the efficacy of our approach in mitigating the impact of adversarial corruptions are provided.
Paper Structure (23 sections, 18 theorems, 48 equations, 5 figures, 1 table)

This paper contains 23 sections, 18 theorems, 48 equations, 5 figures, 1 table.

Key Result

Proposition 1

Let $\bar{x}\notin\mathcal{X}$ be a dummy point, define the disjoint union $\bar{\mathcal{X}}=\mathcal{X} \sqcup \{\bar{x}\}$, and suppose that the cost $c$ is non-negative. Define the augmented cost function $\bar{c}:\bar{\mathcal{X}}^2 \to [0,\infty]$ by Then, for all $\mu,\nu \in \mathcal{P}(\mathcal{X})$, we have $\mathsf{OT}_c^{\varepsilon}(\mu,\nu) = \mathsf{OT}_{\bar{c}}(\mu + \varepsilon

Figures (5)

  • Figure 1: Visualization of Proposition \ref{['prop:alternative-primal-probs']}. The gridded light blue and green regions each have mass $\varepsilon$, respectively, and are removed to obtain optimal $\mu_{\hbox{$-$}}$ and $\nu_{\hbox{$-$}}$ for $\mathsf{W}_1^\varepsilon$. No mass need be removed from the dark region designating $\mu \land \nu$.
  • Figure 2: Optimal potentials: (left) 1D densities plotted with their optimal potential for the $\mathsf{W}_1^\varepsilon$ dual problem; (right) contour plots for optimal dual potentials to $\mathsf{W}_1$ and $\mathsf{W}_1^\varepsilon$ between 2D Gaussian mixtures. Observe how optimal potentials for the robust dual are flat over outlier mass.
  • Figure 3: Visual depiction of MDE under $\mathsf{W}_p^\varepsilon$ and its analysis. Solid lines represent statistical distance bounds given by the problem formulation, and dotted lines represent bounds deduced by our choice of estimator and the approximate triangle inequality for $\mathsf{W}_p^\varepsilon$. We abbreviate $\delta_n = \mathsf{W}_p(\hat{\mu}_n,\mu)$.
  • Figure 4: (top): samples generated by robustified (left) and standard (right) WGAN-GP after training on corrupted MNIST dataset. (bottom): samples generated by robustified (left) and standard (right) StyleGAN 2 after training on corrupted CelebA-HQ dataset (left).
  • Figure 5: Samples generated by various robust WGANs after 100k batches of training.

Theorems & Definitions (35)

  • Proposition 1: POT as augmented standard OT, caffarelli2010
  • Proposition 2: Robustness of MDE, donoho88
  • Lemma 1: Bounded modulus under resilience, zhu2019resilience
  • Example 1: Robust mean estimation
  • Proposition 3: Equivalent reformulations
  • Proposition 4: Basic properties
  • Theorem 1: Dual form
  • proof : Proof sketch
  • Remark 1: TV as a dual norm
  • Proposition 5: Loss trimming dual
  • ...and 25 more