Gromov-Wasserstein-like Distances in the Gaussian Mixture Models Space

Antoine Salmona; Julie Delon; Agnès Desolneux

Gromov-Wasserstein-like Distances in the Gaussian Mixture Models Space

Antoine Salmona, Julie Delon, Agnès Desolneux

TL;DR

This work introduces two isometry-invariant, Gromov-Wasserstein-type OT distances for Gaussian mixtures: MGW2, a Gromovization of the Mixture-Wasserstein distance, and EW2 (and its GMM specialization MEW2) to obtain explicit transport plans between GMMs. MGW2 reduces the GW problem to a small-scale discrete coupling over components, enabling efficient distance computation across different dimensions, while EW2 provides a structured way to derive transport maps by optimizing an isometric transformation. The authors propose annealing strategies and plan-design tricks to mitigate nonconvexity, and demonstrate competitive performance in shape matching and hyperspectral color transfer on medium-to-large scale problems, highlighting practical advantages over traditional GW solvers. The results suggest that MGW2 and EW2 offer scalable, invariance-aware tools for comparing clustered distributions and extracting correspondences in high-dimensional spaces. The work highlights practical impact for tasks with intrinsic clustering, cross-domain comparisons, and applications requiring robust mapping between heterogeneous data shapes.

Abstract

The Gromov-Wasserstein (GW) distance is frequently used in machine learning to compare distributions across distinct metric spaces. Despite its utility, it remains computationally intensive, especially for large-scale problems. Recently, a novel Wasserstein distance specifically tailored for Gaussian mixture models (GMMs) and known as MW2 (mixture Wasserstein) has been introduced by several authors. In scenarios where data exhibit clustering, this approach simplifies to a small-scale discrete optimal transport problem, which complexity depends solely on the number of Gaussian components in the GMMs. This paper aims to incorporate invariance properties into MW2. This is done by introducing new Gromov-type distances, designed to be isometry-invariant in Euclidean spaces and applicable for comparing GMMs across different dimensional spaces. Our first contribution is the Mixture Gromov Wasserstein distance (MGW2), which can be viewed as a "Gromovized" version of MW2. This new distance has a straightforward discrete formulation, making it highly efficient for estimating distances between GMMs in practical applications. To facilitate the derivation of a transport plan between GMMs, we present a second distance, the Embedded Wasserstein distance (EW2). This distance turns out to be closely related to several recent alternatives to Gromov-Wasserstein. We show that EW2 can be adapted to derive a distance as well as optimal transportation plans between GMMs. We demonstrate the efficiency of these newly proposed distances on medium to large-scale problems, including shape matching and hyperspectral image color transfer.

Gromov-Wasserstein-like Distances in the Gaussian Mixture Models Space

TL;DR

Abstract

Paper Structure (46 sections, 15 theorems, 127 equations, 11 figures, 1 table, 3 algorithms)

This paper contains 46 sections, 15 theorems, 127 equations, 11 figures, 1 table, 3 algorithms.

Introduction
Contributions of the paper.
Background : Mixture-Wasserstein and Gromov-Wasserstein-type distances
Mixture-Wasserstein distance between GMMs
Gromov-Wasserstein distance
Other invariant distances
Gromov-Wasserstein distance between mixture of Gaussians
Metric properties
$MGW_2$ in practice
Using $MGW_2$ on discrete data distributions.
Difficulty of designing a transportation plan.
Embedded Wasserstein distance
Properties of $EW_2$
Embedded Wasserstein distance between GMMs
Numerical solver
...and 31 more sections

Key Result

Proposition 1

In the following, $\mu = \sum_k a_k\mu_k$ and $\nu = \sum_l b_l\nu_l$ are two GMMs respectively in $GMM_K(\mathbb{R}^d)$ and $GMM_L(\mathbb{R}^{d'})$.

Figures (11)

Figure 1: Transport plans between two discrete centered distributions on $\mathbb{R}^2$ composed of three points. Left: optimal coupling given by the maximization of Problem \ref{['eq:norm2']}. Right: optimal coupling given by the maximization of Problem \ref{['eq:norm1']}.
Figure 2: Left first column: spiral datasets (in blue and red) composed of $150$ points of $\mathbb{R}^2$. The red dataset corresponds to points sampled from the distribution of the blue dataset rotated from $\uppi$. Left second column: The two corresponding learned GMMs with $20$ components via EM algorithm (each color corresponds to a Gaussian component of the GMMs). Right: evolution of $MGW_2$, $GW_2$, $MW_2$, and $W_2$ between the initial distribution (in blue) and the rotated ones in function of the angle of rotation. Experiments are averaged over $10$ runs and the colored bands correspond to $+/-$ the standard deviation. This experiment is inspired from titouan2019sliced.
Figure 3: Left: two discrete distributions $\hat{\mu}$ (in gradient of colors) and $\hat{\nu}$ (in blue) that have been drawn from two GMMs. The colors have been added to $\hat{\mu}$ in order to visualize the couplings between $\hat{\mu}$ and $\hat{\nu}$. Middle and right: two possible solutions of transport of $\hat{\mu}$ obtained by plugging the discrete plan that minimizes $MGW_2$ in \ref{['eq:optiplan']}, using restricted-$GW_2$ transport maps salmona2021gromov to transport the Gaussian components. Observe that the middle solution preserves the global structure of the mixture, in the sense that points that are close to each other but associated with different Gaussian components remain close when tranported. This is not the case for the right solution.
Figure 4: Left: two discrete distributions $\hat{\mu}$ (in gradient of colors) and $\hat{\nu}$ (in blue) that have been drawn from two GMMs. The colors have been added to $\hat{\mu}$ in order to visualize the couplings between $\hat{\mu}$ and $\hat{\nu}$. Middle: transport of $\hat{\mu}$ obtained by solving the $MGW_2$ problem, then deriving $P_{MGW_2} \in \mathbb{V}_2(\mathbb{R}^2)$ by solving Problem \ref{['eq:Pmgw2']}. Right: transport of $\hat{\mu}$ obtained by solving the $MEW_2$ problem.
Figure 5: MDS on the galloping horse animation using the $MGW_2$ distance (left), and the $MEW_2$ distance (middle). Each point corresponds to a given mesh and the meshes are colored in function of their number in the sequence. Right: $4$ examples among the $45$ meshes that composes the sequence. The computations of both distances have been done by first fitting GMMs with $20$ components on each mesh independently.
...and 6 more figures

Theorems & Definitions (38)

Definition 1
Proposition 1
proof : Sketch of proof.
Proposition 2
proof : Sketch of proof.
Example 1
Definition 2
Proposition 3
proof : Sketch of proof.
Proposition 4
...and 28 more

Gromov-Wasserstein-like Distances in the Gaussian Mixture Models Space

TL;DR

Abstract

Gromov-Wasserstein-like Distances in the Gaussian Mixture Models Space

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (38)