Table of Contents
Fetching ...

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Carlos Mullov, Ngoc-Quan Pham, Alexander Waibel

TL;DR

The paper tackles zero-shot translation from unseen languages by decoupling vocabulary learning from NMT training and aligning per-language embeddings into a shared hub space. The approach uses frozen cross-lingual word embeddings to enable plug-and-play translation for unseen languages, with additional benefits for unsupervised MT via back-translation. Empirical results show strong zero-shot performance on closely related languages and competitive unsupervised MT outcomes with limited monolingual data, while analyses highlight the importance of multilinguality and domain. The method offers a practical path to extend multilingual MT to many low-resource languages with reduced adaptation cost and data requirements.

Abstract

Multilingual neural machine translation systems learn to map sentences of different languages into a common representation space. Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily adaptable to new languages. In this work, we test this hypothesis by zero-shot translating from unseen languages. To deal with unknown vocabularies from unknown languages we propose a setup where we decouple learning of vocabulary and syntax, i.e. for each language we learn word representations in a separate step (using cross-lingual word embeddings), and then train to translate while keeping those word representations frozen. We demonstrate that this setup enables zero-shot translation from entirely unseen languages. Zero-shot translating with a model trained on Germanic and Romance languages we achieve scores of 42.6 BLEU for Portuguese-English and 20.7 BLEU for Russian-English on TED domain. We explore how this zero-shot translation capability develops with varying number of languages seen by the encoder. Lastly, we explore the effectiveness of our decoupled learning strategy for unsupervised machine translation. By exploiting our model's zero-shot translation capability for iterative back-translation we attain near parity with a supervised setting.

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

TL;DR

The paper tackles zero-shot translation from unseen languages by decoupling vocabulary learning from NMT training and aligning per-language embeddings into a shared hub space. The approach uses frozen cross-lingual word embeddings to enable plug-and-play translation for unseen languages, with additional benefits for unsupervised MT via back-translation. Empirical results show strong zero-shot performance on closely related languages and competitive unsupervised MT outcomes with limited monolingual data, while analyses highlight the importance of multilinguality and domain. The method offers a practical path to extend multilingual MT to many low-resource languages with reduced adaptation cost and data requirements.

Abstract

Multilingual neural machine translation systems learn to map sentences of different languages into a common representation space. Intuitively, with a growing number of seen languages the encoder sentence representation grows more flexible and easily adaptable to new languages. In this work, we test this hypothesis by zero-shot translating from unseen languages. To deal with unknown vocabularies from unknown languages we propose a setup where we decouple learning of vocabulary and syntax, i.e. for each language we learn word representations in a separate step (using cross-lingual word embeddings), and then train to translate while keeping those word representations frozen. We demonstrate that this setup enables zero-shot translation from entirely unseen languages. Zero-shot translating with a model trained on Germanic and Romance languages we achieve scores of 42.6 BLEU for Portuguese-English and 20.7 BLEU for Russian-English on TED domain. We explore how this zero-shot translation capability develops with varying number of languages seen by the encoder. Lastly, we explore the effectiveness of our decoupled learning strategy for unsupervised machine translation. By exploiting our model's zero-shot translation capability for iterative back-translation we attain near parity with a supervised setting.
Paper Structure (33 sections, 1 equation, 2 figures, 16 tables)

This paper contains 33 sections, 1 equation, 2 figures, 16 tables.

Figures (2)

  • Figure 1: Our NMT architecture consists of a Transformer model with pre-trained cross-lingual word embeddings (CLWE) for embedding layers. The embedding vectors are obtained through alignment of monolingual fastText embeddings for each language into a common embedding space.
  • Figure 2: Our decoupled learning of word representations enables us to zero-shot translate from an unseen language $\ell_{\text{new}}$ in a plug-and-play fashion. The Transformer layers have no prior exposure to $\ell_{\text{new}}$. The cross-lingual word embeddings (CLWE) serve as lookup table for $\ell_{\text{new}}$ words.