Gromov-Wasserstein Alignment of Word Embedding Spaces
David Alvarez-Melis, Tommi S. Jaakkola
TL;DR
This work reframes cross-lingual word embedding alignment as a Gromov-Wasserstein optimal transport problem, leveraging relational similarities rather than absolute vector positions to learn language mappings in a fully unsupervised manner. The authors develop an efficient GW-based objective, show it can be solved in a single step with minimal tuning, and extend it to large vocabularies via a two-stage scaling approach. Empirical results on standard benchmarks demonstrate competitive performance with substantially lower computational cost and fewer hyper-parameter requirements than state-of-the-art unsupervised methods. The work also provides a geometric, language-distance perspective on embedding spaces, offering interpretable qualitative insights into language relationships.
Abstract
Cross-lingual or cross-domain correspondences play key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have become effective alignment tools. Current state-of-the-art methods, however, involve multiple steps, including heuristic post-hoc refinement strategies. In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. Indeed, we exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages. We show that our OT objective can be estimated efficiently, requires little or no tuning, and results in performance comparable with the state-of-the-art in various unsupervised word translation tasks.
