Table of Contents
Fetching ...

Inference via robust optimal transportation: theory and methods

Yiming Ma, Hang Liu, Davide La Vecchia, Metthieu Lerasle

TL;DR

A robust version of the primal transportation problem is considered and it is shown that it defines the robust Wasserstein distance, $W^{(\lambda)}$, depending on a tuning parameter $\lambda>0$.

Abstract

Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $p\geq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it is sensitive to outliers and it may not be even defined when the underlying model has infinite moments. To cope with these problems, first we consider a robust version of the primal transportation problem and show that it defines the {robust Wasserstein distance}, $W^{(λ)}$, depending on a tuning parameter $λ> 0$. Second, we illustrate the link between $W_1$ and $W^{(λ)}$ and study its key measure theoretic aspects. Third, we derive some concentration inequalities for $W^{(λ)}$. Fourth, we use $W^{(λ)}$ to define minimum distance estimators, we provide their statistical guarantees and we illustrate how to apply the derived concentration inequalities for a data driven selection of $λ$. Fifth, we provide the {dual} form of the robust optimal transportation problem and we apply it to machine learning problems (generative adversarial networks and domain adaptation). Numerical exercises provide evidence of the benefits yielded by our novel methods.

Inference via robust optimal transportation: theory and methods

TL;DR

A robust version of the primal transportation problem is considered and it is shown that it defines the robust Wasserstein distance, , depending on a tuning parameter .

Abstract

Optimal transportation theory and the related -Wasserstein distance (, ) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it is sensitive to outliers and it may not be even defined when the underlying model has infinite moments. To cope with these problems, first we consider a robust version of the primal transportation problem and show that it defines the {robust Wasserstein distance}, , depending on a tuning parameter . Second, we illustrate the link between and and study its key measure theoretic aspects. Third, we derive some concentration inequalities for . Fourth, we use to define minimum distance estimators, we provide their statistical guarantees and we illustrate how to apply the derived concentration inequalities for a data driven selection of . Fifth, we provide the {dual} form of the robust optimal transportation problem and we apply it to machine learning problems (generative adversarial networks and domain adaptation). Numerical exercises provide evidence of the benefits yielded by our novel methods.
Paper Structure (27 sections, 19 theorems, 142 equations, 14 figures, 3 tables, 3 algorithms)

This paper contains 27 sections, 19 theorems, 142 equations, 14 figures, 3 tables, 3 algorithms.

Key Result

Theorem 2.1

Let $\mu,\nu$ be probability measures in $\mathcal{P(X)}$, with $c_{\lambda}(x,y)$ as in (Eqc_l). Then the c-transform of a c-convex function $\psi(x)$ is itself, i.e. $\psi^{c}(x)= \psi(x)$. Moreover, the dual form of ROBOT is related to the Kantorovich potential $\psi$, which is a solution to where $\psi$ satisfies $\vert \psi(x)-\psi(y) \vert \leq d(x,y)$ and $\mathrm{range} (\psi) \leq 2\lamb

Figures (14)

  • Figure 1: Wasserstein distance ($W_1$ and $W_2$) and robust Wasserstein distance ($W^{(\lambda)}$, with $\lambda=3$) between two bivariate distributions. The scatter plot of data in panel (a) represents a sample from the reference model: cross points (blue) are generated from $\mathcal{N}\left( \binom{-1}{-1},I_2 \right)$, points (orange) are generated from $\mathcal{N}\left( \binom{1}{1},I_2 \right)$. The plot in panel (b) contains some outliers: cross points (blue) are generated from $0.8 \mathcal{N}\left( \binom{-1}{-1},I_2 \right)+0.2\mathcal{N}\left( \binom{-9}{-9},I_2 \right)$, points (orange) are generated from $\mathcal{N}\left( \binom{1}{1},I_2 \right)$. The marginal distributions are plotted on the $x$- and $y$-axis.
  • Figure 2: The continuous line represents $\Delta(x,W_1)$. The dashed (red), dot-dashed (blue) and dotted (green) line represents $\Delta(x,W^{(\lambda)})$ with $\lambda=3,4, 5$ respectively.
  • Figure 3: Blue triangles: 200 data points sampled from the reference distribution (uncontaminated observations). Red squares: 200 data points generated from WGAN (1st row), RWGAN-1 (2nd row), RWGAN-2 (3rd row), RWGAN-B(4th row), RWGAN-N with $\epsilon = 0.07$ (5th row), RWGAN-N with $\epsilon = 0.25$ (6th row) and RWGAN-D (7th row). Empirical Wasserstein distance $p=1$ between triangles and squares is provided. (1st column: $\varepsilon = 0.1$,$\eta= 2$; 2nd column: $\varepsilon = 0.1$,$\eta= 3$; 3rd column: $\varepsilon = 0.2$,$\eta= 2$; 4th column: $\varepsilon = 0.2$,$\eta= 3$).
  • Figure 4: Style of clean reference sample images and outlier images. Panel (a) contains pictures that are regarded as the right style. Panel (b) contains negative pictures (outliers).
  • Figure 5: 64 images generated from WGAN (1st row), RWGAN-1 (2nd row), RWGAN-2 (3rd row), RWGAN-B (4th row). (1st column: no outliers; 2nd column: 3000 outliers; 3rd column: 6000 outliers).
  • ...and 9 more figures

Theorems & Definitions (23)

  • Theorem 2.1
  • Lemma 3.1
  • Theorem 3.2
  • Definition 3.3: Robust Wasserstein space
  • Theorem 3.4
  • Theorem 3.5
  • Corollary 3.6: Continuity of $W^{(\lambda)}$
  • Theorem 3.7
  • Theorem 3.8
  • Theorem 3.9
  • ...and 13 more