Table of Contents
Fetching ...

On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans

Titouan Vayer

TL;DR

The paper investigates how Gromov-Wasserstein (GW) optimal plans relate to classical linear OT properties. By introducing a conditionally negative semi-definite (CND) tensor framework for the GW loss, it shows that concavity of the GW objective on the coupling polytope implies the existence of sparse, extreme-point optima and tight coupling relaxations, mirroring Monge–Kantorovich equivalence under suitable conditions. It provides a detailed characterization for separable losses and gives concrete examples, notably the squared-distance and KL-divergence cases, where the CND conditions hold via Schoenberg-type results and infinite divisibility, respectively. The work also discusses a monotonicity-type property for GW arising from a linearization at the GW optimum and clarifies the limits of these properties beyond the CND regime. Overall, CND energies offer a practical lens to understand and exploit GW structure in algorithms, while noting that such properties are not universal but are commonly encountered in practice.

Abstract

This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal transport plans sparse? Under what conditions are they supported on a permutation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite property and show that, when it holds, there are GW optimal plans that are sparse and supported on a permutation.

On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans

TL;DR

The paper investigates how Gromov-Wasserstein (GW) optimal plans relate to classical linear OT properties. By introducing a conditionally negative semi-definite (CND) tensor framework for the GW loss, it shows that concavity of the GW objective on the coupling polytope implies the existence of sparse, extreme-point optima and tight coupling relaxations, mirroring Monge–Kantorovich equivalence under suitable conditions. It provides a detailed characterization for separable losses and gives concrete examples, notably the squared-distance and KL-divergence cases, where the CND conditions hold via Schoenberg-type results and infinite divisibility, respectively. The work also discusses a monotonicity-type property for GW arising from a linearization at the GW optimum and clarifies the limits of these properties beyond the CND regime. Overall, CND energies offer a practical lens to understand and exploit GW structure in algorithms, while noting that such properties are not universal but are commonly encountered in practice.

Abstract

This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal transport plans sparse? Under what conditions are they supported on a permutation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite property and show that, when it holds, there are GW optimal plans that are sparse and supported on a permutation.
Paper Structure (16 sections, 14 theorems, 39 equations, 1 figure)

This paper contains 16 sections, 14 theorems, 39 equations, 1 figure.

Key Result

Theorem 2.1

For any costs $\mathbf{C}$, a coupling $\mathbf{P} \in \Pi(\mathbf{a}, \mathbf{b})$ is optimal for eq:linear_ot if and only if for any $N \in \mathbb{N}^{*}, (i_1, j_1),\cdots, (i_N, j_N) \in {\operatorname{supp}}(\mathbf{P})^N$ and permutation $\sigma \in \mathfrak{S}_N$,

Figures (1)

  • Figure 1: (Left) Bipartite graph $G(\mathbf{P})$ induced by $\mathbf{P}$. Weights on the edges are the values $P_{ij}$. (Right) It contains a 3-cycle $i_1, j_1, i_2, j_2, i_3, j_3, i_1$. The forward edges $i \to j$ are marked with a $+\varepsilon$ perturbation, the backward with a $-\varepsilon$.

Theorems & Definitions (27)

  • Theorem 2.1
  • Proposition 2.2
  • proof
  • Proposition 2.3
  • proof
  • Theorem 2.4: Birkhoff
  • proof
  • Corollary 2.5
  • proof
  • Definition 3.1
  • ...and 17 more