Gromov-Wasserstein at Scale, Beyond Squared Norms

Guillaume Houry; Jean Feydy; François-Xavier Vialard

Gromov-Wasserstein at Scale, Beyond Squared Norms

Guillaume Houry, Jean Feydy, François-Xavier Vialard

TL;DR

This work tackles the scalability and rotation-sensitivity of Gromov-Wasserstein matching by introducing CNT costs, which admit a lifted feature-space embedding making GW behave like a linear-algebraic alignment problem. The authors derive a dual formulation showing GW_{ abla} with CNT costs reduces to a coupled optimization over a linear map Γ and an optimal transport plan in lifted spaces, enabling an alternating minimization that is memory-efficient and time-feasible for large point sets. They establish entropic debiasing and convergence guarantees for CNT-EGW, and present practical solvers (CNT-GW, Kernel-GW, and Multiscale-GW MsGW) with linear memory and quadratic time complexity, scalable to hundreds of thousands of points. Empirically, CNT-based methods outperform state-of-the-art GW solvers by large margins, enable GW barycenters and landscape visualization, and reach speeds sufficient for near-real-time registration on datasets with up to ~177k points, highlighting broad applicability to high-resolution geometric tasks.

Abstract

A fundamental challenge in data science is to match disparate point sets with each other. While optimal transport efficiently minimizes point displacements under a bijectivity constraint, it is inherently sensitive to rotations. Conversely, minimizing distortions via the Gromov-Wasserstein (GW) framework addresses this limitation but introduces a non-convex, computationally demanding optimization problem. In this work, we identify a broad class of distortion penalties that reduce to a simple alignment problem within a lifted feature space. Leveraging this insight, we introduce an iterative GW solver with a linear memory footprint and quadratic (rather than cubic) time complexity. Our method is differentiable, comes with strong theoretical guarantees, and scales to hundreds of thousands of points in minutes. This efficiency unlocks a wide range of geometric applications and enables the exploration of the GW energy landscape, whose local minima encode the symmetries of the matching problem.

Gromov-Wasserstein at Scale, Beyond Squared Norms

TL;DR

Abstract

Paper Structure (88 sections, 23 theorems, 105 equations, 17 figures, 1 table, 13 algorithms)

This paper contains 88 sections, 23 theorems, 105 equations, 17 figures, 1 table, 13 algorithms.

Introduction
Point Set Registration.
Optimal Transport (OT).
Gromov-Wasserstein (GW).
Contributions.
Background and Notations
Optimal Transport Plans.
The Sinkhorn Algorithm.
Gromov-Wasserstein (GW).
Entropic Regularization.
Embeddable Costs.
Hilbert-Schmidt Operators.
Gromov-Wasserstein with CNT Costs
GW features.
Example.
...and 73 more sections

Key Result

Proposition 2.0

Let $c_\mathcal{X}$ and $c_\mathcal{Y}$ be two symmetric functions with non-negative values such that: Then, $GW(\alpha, \beta) = 0$ if and only if $\alpha$ and $\beta$ are isometric, i.e. there exists an application $I:\!~\mathcal{X}~\!\longrightarrow\!~\mathcal{Y}$ that pushes $\alpha$ onto $\beta$ such that for all $x$ and $x'$ in the support of $\alpha$:

Figures (17)

Figure 1: (a) We optimize the GW objective of Eq. \ref{['eq:GW_loss']} to match a source distribution of points in the unit square ("C") with a target ("S"). Colors let us visualize the destination of every source point. (b) The preservation of squared distances between points has been studied extensively, but prioritizes the alignment of principal axes over smoothness. We propose a scalable GW solver for a broad class of penalties that may promote the preservation of topology (c) or find a balance between local and global structure (d). (e) This opens the door to applications on high-resolution data, such as those silhouettes sampled with 177k points each.
Figure 2: (left) In order to match a source distribution $\alpha$ (red "C") with a target distribution $\beta$ (blue "u"), the GW objective of Eq. \ref{['eq:GW_loss']} penalizes distortions between corresponding pairs of points. (right) \ref{['corrolary:egw_icp_pi']} shows that up to known embeddings in Hilbert spaces and the optimization of a linear alignment $\Gamma$, we can reduce this problem to the computation of an OT plan for the squared norm cost.
Figure 3: Wasserstein gradient flows $\alpha_{t+\delta t} = \alpha_{t} - \delta t \nabla V(\alpha_t, \beta)$ transporting a source measure $\alpha_0$ (red cross) towards a target $\beta$ (blue heart). Entropic bias causes the limit measure to collapse in small clusters with $\mathrm{GW}_\varepsilon$. The debiased Sinkhorn GW divergence $\mathrm{SGW}_\varepsilon$ fixes this issue, as the flow converges to the target. At large temperature $\varepsilon$, convergence fails and the flow remains cross-shaped.
Figure 4: Optimization landscape of the EGW problem, visualized in a dual plane. We match a source point cloud (with arms raised) to a target pose (running). We highlight the $8$ best minima, the corresponding EGW losses and the proportions of random seeds $\Gamma_0$ that fell in their attraction basins.
Figure 5: (a) Interpolating between the first (top) and seventh (bottom) thoracic vertebrae. We compute barycenters on normalized data, and align them with true anatomical positions for visualization purposes. (b) Wasserstein barycenters are now affordable, but create many topological artifacts agueh2011barycenters. (c) We compute GW barycenters for the Euclidean cost in $\mathbb{R}^3$. The GW metric puts more emphasis on topology preservation and could become a versatile baseline for 3D shape analysis.
...and 12 more figures

Theorems & Definitions (38)

Proposition 2.0
Theorem 2.1: schoenberg1938metric
Proposition 2.2: schoenberg1938metric
Definition 3.0: GW-embeddings
Theorem 3.1
Theorem 3.2
Theorem 3.3
Theorem 4.1
Theorem 4.2
Proposition 4.2
...and 28 more

Gromov-Wasserstein at Scale, Beyond Squared Norms

TL;DR

Abstract

Gromov-Wasserstein at Scale, Beyond Squared Norms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (17)

Theorems & Definitions (38)