Table of Contents
Fetching ...

A Survey Of Cross-lingual Word Embedding Models

Sebastian Ruder, Ivan Vulić, Anders Søgaard

TL;DR

This survey documents cross-lingual word embedding models through a unifying typology based on data signals and supervision (word-, sentence-, and document-level, parallel vs. comparable). It demonstrates that many approaches optimize essentially the same objectives, differing mainly in data and optimization strategy, and it highlights mappings, pseudo-bilingual, and joint learning as connected paradigms. The authors provide historical context, discuss evaluation frameworks and benchmarks, and map multilingual extensions from bilingual models, including pivot-language strategies. They also outline practical challenges and future directions, such as subword information, multi-word expressions, polysemy, and robust unsupervised methods, emphasizing data quality and compatibility over architectural novelty. Overall, the work offers a comprehensive, standardized view of cross-lingual embeddings and guides future research toward data-centric improvements and multilingual scalability.

Abstract

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

A Survey Of Cross-lingual Word Embedding Models

TL;DR

This survey documents cross-lingual word embedding models through a unifying typology based on data signals and supervision (word-, sentence-, and document-level, parallel vs. comparable). It demonstrates that many approaches optimize essentially the same objectives, differing mainly in data and optimization strategy, and it highlights mappings, pseudo-bilingual, and joint learning as connected paradigms. The authors provide historical context, discuss evaluation frameworks and benchmarks, and map multilingual extensions from bilingual models, including pivot-language strategies. They also outline practical challenges and future directions, such as subword information, multi-word expressions, polysemy, and robust unsupervised methods, emphasizing data quality and compatibility over architectural novelty. Overall, the work offers a comprehensive, standardized view of cross-lingual embeddings and guides future research toward data-centric improvements and multilingual scalability.

Abstract

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

Paper Structure

This paper contains 77 sections, 2 theorems, 51 equations, 6 figures, 5 tables.

Key Result

Lemma 1

Pseudo-bilingual sampling is, in the limit, equivalent to Constrained Bilingual SGNS.

Figures (6)

  • Figure 1: Unaligned monolingual word embeddings (left) and word embeddings projected into a joint cross-lingual embedding space (right). Embeddings are visualized with t-SNE.
  • Figure 2: Examples for the nature and type of alignment of data sources. Par.: parallel. Comp.: comparable. Doc.: document. From left to right, word-level parallel alignment in the form of a bilingual lexicon (\ref{['fig:word_par']}), word-level comparable alignment using images obtained with Google search queries (\ref{['fig:word_comp']}), sentence-level parallel alignment with translations (\ref{['fig:sent_par']}), sentence-level comparable alignment using translations of several image captions (\ref{['fig:sent_comp']}), and document-level comparable alignment using similar documents (\ref{['fig:doc_comp']}).
  • Figure 3: Similar geometric relations between numbers and animals in English and Spanish Mikolov2013b. Words embeddings are projected to two dimensions using PCA and were manually rotated to emphasize similarities.
  • Figure 4: Learning shared multilingual embedding spaces via linear mapping. (a) Starting from monolingual spaces in $L$ languages, one linearly maps $L-1$ into one chosen pivot monolingual space (typically English); (b) Starting from bilingual spaces sharing a language (typically English), one learns mappings from all other English subspaces into one chosen pivot English subspace and then applies the mapping to all other subspaces.
  • Figure 5: Illustration of the joint multilingual model of Duong et al. (2017) based on the modified CBOW objective; instead of predicting only the English word given the English context, the model also tries to predict its translations in all the remaining languages (i.e., in languages for which the translations exist in any of the input bilingual lexicons).
  • ...and 1 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Lemma 2
  • proof