Optimal Transportation and Alignment Between Gaussian Measures
Sanjit Dandapanthula, Aleksandr Podkopaev, Shiva Prasad Kasiviswanathan, Aaditya Ramdas, Ziv Goldfeld
TL;DR
The paper advances the theory and computation of optimal transport and Gromov‑Wasserstein alignment for Gaussian measures under quadratic costs, culminating in closed‑form IGW results and tight bounds for uncentered Gaussians on Hilbert spaces. It derives a unitary‑optimization reduction for the IGW distance, extends IGW barycenters to centered Gaussians in infinite dimensions, and provides a tractable, rank‑aware semidefinite formulation for multimarginal OT, with a scalable Burer‑Monteiro algorithm. The work is demonstrated on language‑model representation comparison and heterogeneous clustering tasks, highlighting both practical utility and computational efficiency. Together, these results enable efficient, geometry‑preserving comparison and aggregation of heterogeneous, high‑dimensional Gaussian representations. Potential directions include convergence analysis of Riemannian gradient methods for IGW and deeper exploration of the GBW geometry on covariance operators.
Abstract
Optimal transport (OT) and Gromov-Wasserstein (GW) alignment provide interpretable geometric frameworks for comparing, transforming, and aggregating heterogeneous datasets -- tasks ubiquitous in data science and machine learning. Because these frameworks are computationally expensive, large-scale applications often rely on closed-form solutions for Gaussian distributions under quadratic cost. This work provides a comprehensive treatment of Gaussian, quadratic cost OT and inner product GW (IGW) alignment, closing several gaps in the literature to broaden applicability. First, we treat the open problem of IGW alignment between uncentered Gaussians on separable Hilbert spaces by giving a closed-form expression up to a quadratic optimization over unitary operators, for which we derive tight analytic upper and lower bounds. If at least one Gaussian measure is centered, the solution reduces to a fully closed-form expression, which we further extend to an analytic solution for the IGW barycenter between centered Gaussians. We also present a reduction of Gaussian multimarginal OT with pairwise quadratic costs to a tractable optimization problem and provide an efficient algorithm to solve it using a rank-deficiency constraint. To demonstrate utility, we apply our results to knowledge distillation and heterogeneous clustering on synthetic and real-world datasets.
