Table of Contents
Fetching ...

Optimal Transportation and Alignment Between Gaussian Measures

Sanjit Dandapanthula, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan, Aaditya Ramdas, Ziv Goldfeld

TL;DR

The paper advances the theory and computation of optimal transport and Gromov‑Wasserstein alignment for Gaussian measures under quadratic costs, culminating in closed‑form IGW results and tight bounds for uncentered Gaussians on Hilbert spaces. It derives a unitary‑optimization reduction for the IGW distance, extends IGW barycenters to centered Gaussians in infinite dimensions, and provides a tractable, rank‑aware semidefinite formulation for multimarginal OT, with a scalable Burer‑Monteiro algorithm. The work is demonstrated on language‑model representation comparison and heterogeneous clustering tasks, highlighting both practical utility and computational efficiency. Together, these results enable efficient, geometry‑preserving comparison and aggregation of heterogeneous, high‑dimensional Gaussian representations. Potential directions include convergence analysis of Riemannian gradient methods for IGW and deeper exploration of the GBW geometry on covariance operators.

Abstract

Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitous in data science and machine learning. Because these frameworks are computationally expensive, large-scale applications often rely on closed-form solutions for Gaussian distributions under quadratic cost. This work provides a comprehensive treatment of Gaussian, quadratic cost OT and inner product GW (IGW) alignment, closing several gaps in the literature to broaden applicability. First, we treat the open problem of IGW alignment between uncentered Gaussians on separable Hilbert spaces by giving a closed-form expression up to a quadratic optimization over unitary operators, for which we derive tight analytic upper and lower bounds. If at least one Gaussian measure is centered, the solution reduces to a fully closed-form expression, which we further extend to an analytic solution for the IGW barycenter between centered Gaussians. We also present a reduction of Gaussian multimarginal OT with pairwise quadratic costs to a tractable optimization problem and provide an efficient algorithm to solve it using a rank-deficiency constraint. To demonstrate utility, we apply our results to knowledge distillation and heterogeneous clustering on synthetic and real-world datasets.

Optimal Transportation and Alignment Between Gaussian Measures

TL;DR

The paper advances the theory and computation of optimal transport and Gromov‑Wasserstein alignment for Gaussian measures under quadratic costs, culminating in closed‑form IGW results and tight bounds for uncentered Gaussians on Hilbert spaces. It derives a unitary‑optimization reduction for the IGW distance, extends IGW barycenters to centered Gaussians in infinite dimensions, and provides a tractable, rank‑aware semidefinite formulation for multimarginal OT, with a scalable Burer‑Monteiro algorithm. The work is demonstrated on language‑model representation comparison and heterogeneous clustering tasks, highlighting both practical utility and computational efficiency. Together, these results enable efficient, geometry‑preserving comparison and aggregation of heterogeneous, high‑dimensional Gaussian representations. Potential directions include convergence analysis of Riemannian gradient methods for IGW and deeper exploration of the GBW geometry on covariance operators.

Abstract

Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitous in data science and machine learning. Because these frameworks are computationally expensive, large-scale applications often rely on closed-form solutions for Gaussian distributions under quadratic cost. This work provides a comprehensive treatment of Gaussian, quadratic cost OT and inner product GW (IGW) alignment, closing several gaps in the literature to broaden applicability. First, we treat the open problem of IGW alignment between uncentered Gaussians on separable Hilbert spaces by giving a closed-form expression up to a quadratic optimization over unitary operators, for which we derive tight analytic upper and lower bounds. If at least one Gaussian measure is centered, the solution reduces to a fully closed-form expression, which we further extend to an analytic solution for the IGW barycenter between centered Gaussians. We also present a reduction of Gaussian multimarginal OT with pairwise quadratic costs to a tractable optimization problem and provide an efficient algorithm to solve it using a rank-deficiency constraint. To demonstrate utility, we apply our results to knowledge distillation and heterogeneous clustering on synthetic and real-world datasets.

Paper Structure

This paper contains 41 sections, 16 theorems, 107 equations, 12 figures, 1 table.

Key Result

Proposition 2.1

Suppose that $\rho \in \mathcal{P}(\mathcal{P}_2(\mathbb{R}^d))$ has $\operatorname{supp}(\rho) = \{\mu_1, \ldots, \mu_p\}$ for absolutely continuous measures $\mu_i$. Define for $x_{\mu_1}, \dots, x_{\mu_p} \in \mathbb{R}^d$. If $\pi \in \Pi(\mu_1, \ldots, \mu_p)$ solves the multimarginal OT problem between $\mu_1, \ldots, \mu_p$, then $g_\# \pi$ is a $\rho$-weighted 2-Wasserstein barycenter.

Figures (12)

  • Figure 1: (a) The OT problem compares measures over the same space by searching for the coupling $\pi$ which minimizes expected transportation cost $c(x, y)$. Here we depict the quadratic-cost OT problem, with $c(x, y) = \lVert x - y\rVert_2^2$. (b) GW alignment compares measures over (possibly different) spaces by searching for the coupling $\pi$ which matches the kernels $k_{\mathcal{X}}$ and $k_{\mathcal{Y}}$ as well as possible. Here, we depict the inner product Gromov-Wasserstein (IGW) problem with $k_{\mathcal{X}}(\cdot,\, \cdot) = k_{\mathcal{Y}}(\cdot,\, \cdot)= \langle\cdot,\, \cdot\rangle$, which seeks to find a coupling which is as close to unitary as possible (roughly, preserving pairwise angles).
  • Figure 2: Comparison of displacement interpolations between two Gaussian distributions (origin marked in red). (a) has the origin at the bottom and (b) has the origin at the top; note that the IGW interpolation prefers to rotate the measures around the origin while the $\mathrm{W}_2$ interpolation is invariant to translation of both measures. Top row of (a) and (b): IGW displacement interpolation between two Gaussians with OT map estimated using RGD. Bottom row of (a) and (b): 2-Wasserstein displacement interpolation with OT map from the Bures-Wasserstein formula in \ref{['eq:gaussian-ot-map']}.
  • Figure 3: Contour plots of the $\rho$-weighted 2-Wasserstein barycenter and $\rho$-weighted IGW barycenter. The 2-Wasserstein barycenter does not preserve the covariance structure of the input measures, but the IGW barycenter naturally does.
  • Figure 4: Comparison of multimarginal OT between two sets of Gaussian distributions. Top row: multimarginal OT between three misaligned Gaussians. Bottom row: multimarginal OT between three aligned Gaussians.
  • Figure 5: First two principal components of the embeddings produced by the model on the (a) and (b) datasets (blue), along with contour lines from a Gaussian fit (red). The embeddings are approximately Gaussian.
  • ...and 7 more figures

Theorems & Definitions (42)

  • Proposition 2.1: agueh2011barycenters, Proposition 4.2
  • Theorem 3.1: IGW between Gaussians
  • Remark 3.1: Finite-dimensional case
  • Theorem 3.2: Analytic bounds
  • Corollary 3.1: Gaussian bound
  • Corollary 3.2: Analytic solutions
  • Remark 3.2: Gromov-Bures-Wasserstein distance
  • Example 3.1: Gaussians on the plane
  • Definition 3.1: IGW barycenter
  • Proposition 3.3: Gaussian IGW barycenter
  • ...and 32 more