Table of Contents
Fetching ...

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Samuel L. Smith, David H. P. Turban, Steven Hamblin, Nils Y. Hammerla

TL;DR

This work treats offline bilingual word vectors as an orthogonal Procrustes problem solved by a single SVD, ensuring a rotation that aligns two languages in a shared space. It introduces an inverted softmax to counter hubness and demonstrates robustness by using pseudo-dictionaries built from identical strings and by leveraging Europarl-aligned sentences to derive sentence-level translations. The method yields substantial improvements over prior offline approaches, achieving up to 43% precision @1 for English→Italian word translation and ~68% precision for English sentence retrieval from a large Italian candidate set, while maintaining strong performance without expert bilingual signals. Overall, the paper unifies offline bilingual mapping under a robust, scalable framework and extends it to sentence-level translation tasks.

Abstract

Usually bilingual word vectors are trained "online". Mikolov et al. showed they can also be found "offline", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a "pseudo-dictionary" from the identical character strings which appear in both languages, achieving 40% precision on the same test set. Finally, we extend our method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68%.

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

TL;DR

This work treats offline bilingual word vectors as an orthogonal Procrustes problem solved by a single SVD, ensuring a rotation that aligns two languages in a shared space. It introduces an inverted softmax to counter hubness and demonstrates robustness by using pseudo-dictionaries built from identical strings and by leveraging Europarl-aligned sentences to derive sentence-level translations. The method yields substantial improvements over prior offline approaches, achieving up to 43% precision @1 for English→Italian word translation and ~68% precision for English sentence retrieval from a large Italian candidate set, while maintaining strong performance without expert bilingual signals. Overall, the paper unifies offline bilingual mapping under a robust, scalable framework and extends it to sentence-level translation tasks.

Abstract

Usually bilingual word vectors are trained "online". Mikolov et al. showed they can also be found "offline", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34% to 43%, when translating a test set composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a "pseudo-dictionary" from the identical character strings which appear in both languages, achieving 40% precision on the same test set. Finally, we extend our method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68%.

Paper Structure

This paper contains 16 sections, 10 equations, 1 figure, 8 tables.

Figures (1)

  • Figure 1: A 2D plane through an English-Italian semantic space, before and after applying the SVD on the word vectors discussed below, using a training dictionary of 5000 translation pairs. The examples above were not used during training, but the SVD aligns the translations remarkably well.