Table of Contents
Fetching ...

Sparsistency for Inverse Optimal Transport

Francisco Andrade, Gabriel Peyre, Clarice Poon

TL;DR

An in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features and derives a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition.

Abstract

Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses on Inverse Optimal Transport (iOT), the problem of inferring the ground cost from samples drawn from a coupling that solves an eOT problem. It is a relevant problem that can be used to infer unobserved/missing links, and to obtain meaningful information about the structure of the ground cost yielding the pairing. On one side, iOT benefits from convexity, but on the other side, being ill-posed, it requires regularization to handle the sampling noise. This work presents an in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features. Specifically, we derive a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition. To provide additional insight into this condition, we work out in detail the Gaussian case. We show that as the entropic penalty varies, the iOT problem interpolates between a graphical Lasso and a classical Lasso, thereby establishing a connection between iOT and graph estimation, an important problem in ML.

Sparsistency for Inverse Optimal Transport

TL;DR

An in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features and derives a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition.

Abstract

Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses on Inverse Optimal Transport (iOT), the problem of inferring the ground cost from samples drawn from a coupling that solves an eOT problem. It is a relevant problem that can be used to infer unobserved/missing links, and to obtain meaningful information about the structure of the ground cost yielding the pairing. On one side, iOT benefits from convexity, but on the other side, being ill-posed, it requires regularization to handle the sampling noise. This work presents an in-depth theoretical study of the l1 regularization to model for instance Euclidean costs with sparse interactions between features. Specifically, we derive a sufficient condition for the robust recovery of the sparsity of the ground cost that can be seen as a far reaching generalization of the Lasso's celebrated Irrepresentability Condition. To provide additional insight into this condition, we work out in detail the Gaussian case. We show that as the entropic penalty varies, the iOT problem interpolates between a graphical Lasso and a classical Lasso, thereby establishing a connection between iOT and graph estimation, an important problem in ML.
Paper Structure (47 sections, 21 theorems, 178 equations, 3 figures)

This paper contains 47 sections, 21 theorems, 178 equations, 3 figures.

Key Result

Proposition 2

$A \mapsto W(A)$ is twice differentiable, strictly convex, with gradient and Hessian where $\pi_A$ is the unique solution to (EntropicOT) with cost $c_A=\Phi A$.

Figures (3)

  • Figure 1: Display of the certificate values $z_{i,j}$ for three types of graphs, for varying $\varepsilon$. Left, middle: plotted as a function of the geodesic distance $d_{\text{geod}}(i,j)$ on the $x$-axis. Right: histogram of $z_{i,j}$ for $(i,j)$ at distance $d_{\text{geod}}(i,j)=2$.
  • Figure 2: Recovery performance (number of wrongly estimated position) of $\ell^1$--iOT as a function of $\lambda$ for three different values of $\varepsilon$.
  • Figure 3: Display of certificate values for a non-symmetric planar graph, for varying $\varepsilon$ with edges only for $i > j$. Middle/Right: plots of the certificate values as a function of the geodesic distance $d_{\text{geod}}(i,j)$ of the symmetrized graph. The middle plot show the values when restricted to $i\geqslant j$. The right plot shows the values restricted to $i\leqslant j$ (where there are no edges).

Theorems & Definitions (38)

  • Definition 1
  • Proposition 2
  • Theorem 3
  • Proposition 4
  • Theorem 5
  • Proposition 6
  • Lemma 7
  • Proposition 8: $\varepsilon\to \infty$
  • Proposition 9: $\varepsilon\to 0$
  • Proposition 10
  • ...and 28 more