Table of Contents
Fetching ...

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, Heyan Huang, Ming Zhou

TL;DR

InfoXLM introduces an information-theoretic framework for cross-lingual pre-training by maximizing mutual information across multilingual views. It unifies MMLM, TLM, and introduces XlCo, a cross-lingual contrastive task implemented with MoCo, mixup, and universal-layer contrast. By jointly training on monolingual and parallel data, InfoXLM achieves state-of-the-art results on XNLI, Tatoeba, and MLQA, with improved cross-lingual transferability and representations. The work provides a principled MI perspective and practical pre-training objectives that enhance multilingual transfer and efficiency.

Abstract

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

TL;DR

InfoXLM introduces an information-theoretic framework for cross-lingual pre-training by maximizing mutual information across multilingual views. It unifies MMLM, TLM, and introduces XlCo, a cross-lingual contrastive task implemented with MoCo, mixup, and universal-layer contrast. By jointly training on monolingual and parallel data, InfoXLM achieves state-of-the-art results on XNLI, Tatoeba, and MLQA, with improved cross-lingual transferability and representations. The work provides a principled MI perspective and practical pre-training objectives that enhance multilingual transfer and efficiency.

Abstract

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, inspired by the framework, we propose a new pre-training task based on contrastive learning. Specifically, we regard a bilingual sentence pair as two views of the same meaning and encourage their encoded representations to be more similar than the negative examples. By leveraging both monolingual and parallel corpora, we jointly train the pretext tasks to improve the cross-lingual transferability of pre-trained models. Experimental results on several benchmarks show that our approach achieves considerably better performance. The code and pre-trained models are available at https://aka.ms/infoxlm.

Paper Structure

This paper contains 38 sections, 7 equations, 1 figure, 13 tables.

Figures (1)

  • Figure 1: Evaluation results of different layers on Tatoeba cross-lingual sentence retrieval.