Table of Contents
Fetching ...

From English To Foreign Languages: Transferring Pre-trained Language Models

Ke Tran

TL;DR

The paper presents RAMEN, a resource-efficient approach to transfer pretrained English language models to other languages by initializing language-specific embeddings in the English space and jointly fine-tuning with a shared encoder. It demonstrates that RAMEN can surpass multilingual BERT baselines on zero-shot cross-lingual tasks (XNLI and UD) across six languages, especially when paired with RoBERTa backbones, and provides insight into how syntactic knowledge transfers more readily than semantic knowledge. The work highlights the importance of careful initialization (prefer aligned embeddings over random) and shows RAMEN's potential as both a zero-shot transfer model and a powerful feature extractor for supervised parsing. Overall, the method offers a fast, compute-friendly alternative to training multilingual models from scratch with meaningful practical impact for expanding NLP to more languages.

Abstract

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two zero-shot tasks: natural language inference and dependency parsing.

From English To Foreign Languages: Transferring Pre-trained Language Models

TL;DR

The paper presents RAMEN, a resource-efficient approach to transfer pretrained English language models to other languages by initializing language-specific embeddings in the English space and jointly fine-tuning with a shared encoder. It demonstrates that RAMEN can surpass multilingual BERT baselines on zero-shot cross-lingual tasks (XNLI and UD) across six languages, especially when paired with RoBERTa backbones, and provides insight into how syntactic knowledge transfers more readily than semantic knowledge. The work highlights the importance of careful initialization (prefer aligned embeddings over random) and shows RAMEN's potential as both a zero-shot transfer model and a powerful feature extractor for supervised parsing. Overall, the method offers a fast, compute-friendly alternative to training multilingual models from scratch with meaningful practical impact for expanding NLP to more languages.

Abstract

Pre-trained models have demonstrated their effectiveness in many downstream natural language processing (NLP) tasks. The availability of multilingual pre-trained models enables zero-shot transfer of NLP tasks from high resource languages to low resource ones. However, recent research in improving pre-trained models focuses heavily on English. While it is possible to train the latest neural architectures for other languages from scratch, it is undesirable due to the required amount of compute. In this work, we tackle the problem of transferring an existing pre-trained model from English to other languages under a limited computational budget. With a single GPU, our approach can obtain a foreign BERT base model within a day and a foreign BERT large within two days. Furthermore, evaluating our models on six languages, we demonstrate that our models are better than multilingual BERT on two zero-shot tasks: natural language inference and dependency parsing.

Paper Structure

This paper contains 19 sections, 4 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Illustration of our two-step approach. In the first step, foreign embeddings are initialized in English space (§\ref{['ssec:init_tgt_embs']}). In the second step, we joinly fine-tune both English and foreign models (§\ref{['ssec:tune_blm']}).
  • Figure 2: Accuracy and LAS evaluated at each checkpoints.