Table of Contents
Fetching ...

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

TL;DR

Addresses multilingual IR data scarcity by translating MS MARCO into 13 languages to create mMARCO. Builds mono- and multilingual rerankers (mT5, mMiniLM) and a dense retriever (mColBERT) trained on the translated data, and evaluates zero-shot transfer on Mr. TyDi. Findings show multilingual finetuning often surpasses English-only finetuning in zero-shot settings, and a distilled MiniLM is competitive with larger models, though translation quality only weakly predicts retrieval gains. The dataset and models are released to spur broader multilingual IR research and development.

Abstract

The MS MARCO ranking dataset has been widely used for training deep learning models for IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this type of resource is scarce in languages other than English. In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation. We evaluated mMARCO by finetuning monolingual and multilingual reranking models, as well as a multilingual dense retrieval model on this dataset. We also evaluated models finetuned using the mMARCO dataset in a zero-shot scenario on Mr. TyDi dataset, demonstrating that multilingual models finetuned on our translated dataset achieve superior effectiveness to models finetuned on the original English version alone. Our experiments also show that a distilled multilingual reranker is competitive with non-distilled models while having 5.4 times fewer parameters. Lastly, we show a positive correlation between translation quality and retrieval effectiveness, providing evidence that improvements in translation methods might lead to improvements in multilingual information retrieval. The translated datasets and finetuned models are available at https://github.com/unicamp-dl/mMARCO.

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

TL;DR

Addresses multilingual IR data scarcity by translating MS MARCO into 13 languages to create mMARCO. Builds mono- and multilingual rerankers (mT5, mMiniLM) and a dense retriever (mColBERT) trained on the translated data, and evaluates zero-shot transfer on Mr. TyDi. Findings show multilingual finetuning often surpasses English-only finetuning in zero-shot settings, and a distilled MiniLM is competitive with larger models, though translation quality only weakly predicts retrieval gains. The dataset and models are released to spur broader multilingual IR research and development.

Abstract

The MS MARCO ranking dataset has been widely used for training deep learning models for IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this type of resource is scarce in languages other than English. In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation. We evaluated mMARCO by finetuning monolingual and multilingual reranking models, as well as a multilingual dense retrieval model on this dataset. We also evaluated models finetuned using the mMARCO dataset in a zero-shot scenario on Mr. TyDi dataset, demonstrating that multilingual models finetuned on our translated dataset achieve superior effectiveness to models finetuned on the original English version alone. Our experiments also show that a distilled multilingual reranker is competitive with non-distilled models while having 5.4 times fewer parameters. Lastly, we show a positive correlation between translation quality and retrieval effectiveness, providing evidence that improvements in translation methods might lead to improvements in multilingual information retrieval. The translated datasets and finetuned models are available at https://github.com/unicamp-dl/mMARCO.

Paper Structure

This paper contains 11 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: Translation quality measured as BLEU on Tatoeba vs retrieval quality measured as MRR@10 on mMARCO.