Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem

Xavier Aramayo Carrasco; Maksim Nekrashevich; Petr Mokrov; Evgeny Burnaev; Alexander Korotin

Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem

Xavier Aramayo Carrasco, Maksim Nekrashevich, Petr Mokrov, Evgeny Burnaev, Alexander Korotin

TL;DR

This work addresses the challenging problem of continuous Gromov-Wasserstein OT, which seeks a parametric map $T^*:\mathbb{R}^{d_x}\to\mathbb{R}^{d_y}$ between unknown distributions from samples. It provides formal background on OT and GWOT, highlights that existing solvers largely rely on discrete approximations, and shows their performance deteriorates when source and target data are uncorrelated. The authors introduce NeuralGW, a neural minimax solver that does not depend on discrete GW and can scale to large datasets, offering improved performance on uncorrelated data while facing stability and initialization challenges. Overall, the paper demonstrates that data correlation heavily governs GWOT success and calls for developing reliable, general-purpose continuous GWOT solvers with practical applicability. Key contributions include a minimax reformulation of innerGW, a scalable neural architecture for $f$ and $T$, and extensive large-scale benchmarks revealing both strengths and limitations of current approaches.

Abstract

Recently, the Gromov-Wasserstein Optimal Transport (GWOT) problem has attracted the special attention of the ML community. In this problem, given two distributions supported on two (possibly different) spaces, one has to find the most isometric map between them. In the discrete variant of GWOT, the task is to learn an assignment between given discrete sets of points. In the more advanced continuous formulation, one aims at recovering a parametric mapping between unknown continuous distributions based on i.i.d. samples derived from them. The clear geometrical intuition behind the GWOT makes it a natural choice for several practical use cases, giving rise to a number of proposed solvers. Some of them claim to solve the continuous version of the problem. At the same time, GWOT is notoriously hard, both theoretically and numerically. Moreover, all existing continuous GWOT solvers still heavily rely on discrete techniques. Natural questions arise: to what extent do existing methods unravel the GWOT problem, what difficulties do they encounter, and under which conditions they are successful? Our benchmark paper is an attempt to answer these questions. We specifically focus on the continuous GWOT as the most interesting and debatable setup. We crash-test existing continuous GWOT approaches on different scenarios, carefully record and analyze the obtained results, and identify issues. Our findings experimentally testify that the scientific community is still missing a reliable continuous GWOT solver, which necessitates further research efforts. As the first step in this direction, we propose a new continuous GWOT method which does not rely on discrete techniques and partially solves some of the problems of the competitors.

Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem

TL;DR

This work addresses the challenging problem of continuous Gromov-Wasserstein OT, which seeks a parametric map

between unknown distributions from samples. It provides formal background on OT and GWOT, highlights that existing solvers largely rely on discrete approximations, and shows their performance deteriorates when source and target data are uncorrelated. The authors introduce NeuralGW, a neural minimax solver that does not depend on discrete GW and can scale to large datasets, offering improved performance on uncorrelated data while facing stability and initialization challenges. Overall, the paper demonstrates that data correlation heavily governs GWOT success and calls for developing reliable, general-purpose continuous GWOT solvers with practical applicability. Key contributions include a minimax reformulation of innerGW, a scalable neural architecture for

and

, and extensive large-scale benchmarks revealing both strengths and limitations of current approaches.

Abstract

Paper Structure (31 sections, 2 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm)

This paper contains 31 sections, 2 theorems, 18 equations, 13 figures, 5 tables, 1 algorithm.

Introduction
Background
Optimal Transport (OT) problem
Gromov-Wasserstein OT (GWOT) problem
Practical Learning Setup
Continuous Gromov-Wasserstein solvers
Limitations of existing methods
Pitfalls of practical data setup
Benchmarking GWOT solvers on (un)correlated data: GloVe and BPEmb Experiments
GWOT solvers at large scale
Neural Gromov-Wasserstein Solver
Practical performance of NeuralGW and baselines at large scale
Discussion
LLM Usage.
Proofs of theorems and lemmas
...and 16 more sections

Key Result

Lemma 5.1

It holds that eq:maxot is equivalent to

Figures (13)

Figure 1: A schematic visualization of the OT problems and GW problems (Monge's form).
Figure 2: Data splitting and (un)correlatedness.
Figure 3: Performance of the baseline GWOT solvers for Twitter-GloVe (\ref{['fig:discrete-twitter-glove-100-50']}) and MUSE-BPEmb (English) (\ref{['fig:discrete-MUSE-BP-100-50']}) at different correlatedness levels $\alpha$ for the $100 \rightarrow 50$ setup. Solvers were trained with $N_{\text{train}}/2 = 3000$ samples from spaces of 400K and 90K, respectively; the plot shows results on a 2048-sample test subset. Accuracy metrics were computed over the combined reference space of $N_{\text{train}} + N_{\text{test}} = 8048$ samples.
Figure 4: Performance of the baseline GWOT solvers for the Twitter-GloVe (\ref{['fig:continuous_twitter_glove_100_50']}) and MUSE-BPEmb (English) embeddings (\ref{['fig:continuous_MUSE_BP_100_50']}) at different levels of correlatedness $\alpha$ for the $100\rightarrow 50$ dimensionality setup. The solvers were trained with $N_{train}/2=380\text{K}$ and $N_{train}/2=88\text{K}$ samples, respectively. This plot shows results for a testing subset of $2048$ samples, the accuracy metrics were computed considering reference spaces of $400$K and $90$K samples, respectively.
Figure 5: Learned GWOT map $T$ by different solvers; Toy (3D$\rightarrow$2D) experiment.
...and 8 more figures

Theorems & Definitions (2)

Lemma 5.1: InnerGW as a minimax optimization
Theorem 5.2: Optimal maps solve the minimax problem

Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem

TL;DR

Abstract

Uncovering Challenges of Solving the Continuous Gromov-Wasserstein Problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (2)