Table of Contents
Fetching ...

A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

Sebastian Ruder, Ryan Cotterell, Yova Kementchedjhieva, Anders Søgaard

TL;DR

A novel discriminative latent-variable model for the task of bilingual lexicon induction that combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a state-of-the-art embedding-based approach and derives an efficient Viterbi EM algorithm.

Abstract

We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior improves the induced bilingual lexicons. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior.

A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

TL;DR

A novel discriminative latent-variable model for the task of bilingual lexicon induction that combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a state-of-the-art embedding-based approach and derives an efficient Viterbi EM algorithm.

Abstract

We introduce a novel discriminative latent variable model for bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a representation-based approach (Artetxe et al., 2017). To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical results on six language pairs under two metrics and show that the prior improves the induced bilingual lexicons. We also demonstrate how previous work may be viewed as a similarly fashioned latent-variable model, albeit with a different prior.

Paper Structure

This paper contains 34 sections, 4 theorems, 17 equations, 3 figures, 5 tables, 1 algorithm.

Key Result

Proposition 4.1

The optimization problem $\mathop{\mathrm{ argmax}}\limits_{{\boldsymbol{m}} \in {\mathcal{M}}} \log {p_{\boldsymbol{\theta}}}({\boldsymbol{m}} \mid S, T)$ can be solved in ${\mathcal{O}}(({n_{\textit{src}}}+{n_{\textit{trg}}})^3)$ time with the Hungarian algorithm kuhn1955hungarian.

Figures (3)

  • Figure 1: Partial lexicons of German and English shown as a bipartite graph. German is the target language and English is the source language. The ${n_{\textit{trg}}}=7$ German words are shown in blue and the ${n_{\textit{src}}}=6$ English words are shown in green. A bipartite matching ${\boldsymbol{m}}$ between the two sets of vertices is also depicted. The German nodes in ${\boldsymbol{u}_\textit{trg}}$ are unmatched.
  • Figure 2: Bilingual dictionary induction results of our method and baselines for English--Italian with a 5,000 word seed lexicon across different vocabulary sizes.
  • Figure 3: Bilingual dictionary induction results of our method with different priors using a 5,000 word seed lexicon across different vocabulary sizes.

Theorems & Definitions (6)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition A.1
  • proof
  • Proposition A.1
  • proof