Learning Translations via Matrix Completion

Derry Wijaya; Brendan Callahan; John Hewitt; Jie Gao; Xiao Ling; Marianna Apidianaki; Chris Callison-Burch

Learning Translations via Matrix Completion

Derry Wijaya, Brendan Callahan, John Hewitt, Jie Gao, Xiao Ling, Marianna Apidianaki, Chris Callison-Burch

TL;DR

The paper tackles bilingual lexicon induction under limited parallel data by framing translation learning as matrix completion using multiple noisy signals. It formalizes the translation task as $\hat{X} = P Q^T$, integrating bilingual signals (WIKI, WIKI+CROWD) with auxiliary monolingual and visual cues through a Bayesian Personalized Ranking objective to handle positive-only data and enable multilingual transfer. The approach includes a back-off strategy for cold-start words and explores both linear and nonlinear mappings for monolingual embeddings, demonstrating significant, consistent improvements across 27 languages with top-10 accuracy metrics. The results show state-of-the-art performance and strong generalization, with modular, extensible design and publicly released code and datasets, signaling practical impact for MT in low-resource settings.

Abstract

Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both high and low resource languages.

Learning Translations via Matrix Completion

TL;DR

The paper tackles bilingual lexicon induction under limited parallel data by framing translation learning as matrix completion using multiple noisy signals. It formalizes the translation task as

, integrating bilingual signals (WIKI, WIKI+CROWD) with auxiliary monolingual and visual cues through a Bayesian Personalized Ranking objective to handle positive-only data and enable multilingual transfer. The approach includes a back-off strategy for cold-start words and explores both linear and nonlinear mappings for monolingual embeddings, demonstrating significant, consistent improvements across 27 languages with top-10 accuracy metrics. The results show state-of-the-art performance and strong generalization, with modular, extensible design and publicly released code and datasets, signaling practical impact for MT in low-resource settings.

Abstract

Paper Structure (19 sections, 12 equations, 6 figures, 3 tables)

This paper contains 19 sections, 12 equations, 6 figures, 3 tables.

Introduction
Related Work
Bilingual Lexicon Induction
Bayesian Personalized Ranking (BPR)
Method
Problem Formulation
Bilingual Signals for Translation
Auxiliary Signals for Translation
Learning with Bayesian Personalized Ranking
Experiments
Data
Test sets
Bilingual Signals for Translation
Monolingual Signals for Translation
Bilingually Informed Word Embeddings
...and 4 more sections

Figures (6)

Figure 1: Our framework allows us to use a diverse range of signals to learn translations, including incomplete bilingual dictionaries, information from related languages (like Indonesian loan words from Dutch shown here), word embeddings, and even visual similarity cues.
Figure 2: The word tidur (id) is a cold word with no associated translation in the matrix. Auxiliary features $\theta_f$ about the words can be used to predict translations for cold words.
Figure 3: Wikipedia pages with observed translations to the source (id) and the target (en) languages act as a third language in the matrix.
Figure 4: Five images for the French word eau and its top 4 translations ranked using visual simularities of images associated with English words bergsma2011learning
Figure 5: $Acc_{10}$ on CROWDTest across all 27 languages show that adding more and better signals for translation improves translation accuracies. The top accuracies achieved by our model: BPR_WE vary across languages and appear to be influenced by the amount of data (Wikipedia tokens and seed translations) and tokenization available for the language.
...and 1 more figures

Learning Translations via Matrix Completion

TL;DR

Abstract

Learning Translations via Matrix Completion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)