Multilingual Open QA on the MIA Shared Task

Navya Yarrabelly; Saloni Mittal; Ketan Todi; Kimihiro Hasegawa

Multilingual Open QA on the MIA Shared Task

Navya Yarrabelly, Saloni Mittal, Ketan Todi, Kimihiro Hasegawa

TL;DR

The paper addresses cross-lingual information retrieval and multilingual open QA in low-resource languages without supervision. It introduces a zero-shot Question-Generation based Re-ranking (QGPR) method that re-scores passages from a multilingual dense retriever by estimating $p(q|z)$ with a pretrained multilingual LM, and separately evaluates machine-translation based data augmentation. Experiments on XOR-TYDI-QA show that QGPR yields consistent gains in cross-lingual retrieval for several languages (notably Korean and Japanese), while MT augmentation produces mixed, often limited QA improvements likely due to context-length and translation quality constraints. Overall, the approach provides a training-free enhancement that can augment existing retrieval pipelines and offers insights into language-resource effects in cross-lingual open QA.

Abstract

Cross-lingual information retrieval (CLIR) ~\cite{shi2021cross, asai2021one, jiang2020cross} for example, can find relevant text in any language such as English(high resource) or Telugu (low resource) even when the query is posed in a different, possibly low-resource, language. In this work, we aim to develop useful CLIR models for this constrained, yet important, setting where we do not require any kind of additional supervision or labelled data for retrieval task and hence can work effectively for low-resource languages. \par We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot multilingual question generation model, which is a pre-trained language model, to compute the probability of the input question in the target language conditioned on a retrieved passage, which can be possibly in a different language. We evaluate our method in a completely zero shot setting and doesn't require any training. Thus the main advantage of our method is that our approach can be used to re-rank results obtained by any sparse retrieval methods like BM-25. This eliminates the need for obtaining expensive labelled corpus required for the retrieval tasks and hence can be used for low resource languages.

Multilingual Open QA on the MIA Shared Task

TL;DR

with a pretrained multilingual LM, and separately evaluates machine-translation based data augmentation. Experiments on XOR-TYDI-QA show that QGPR yields consistent gains in cross-lingual retrieval for several languages (notably Korean and Japanese), while MT augmentation produces mixed, often limited QA improvements likely due to context-length and translation quality constraints. Overall, the approach provides a training-free enhancement that can augment existing retrieval pipelines and offers insights into language-resource effects in cross-lingual open QA.

Abstract

Paper Structure (21 sections, 4 equations, 1 figure, 5 tables)

This paper contains 21 sections, 4 equations, 1 figure, 5 tables.

Introduction
Cross-lingual information retrieval
Multilingual Question Answering
Related Work
Multilingual QA
Cross Lingual Retrieval
Baseline
Multilingual Dense Passage Retriever (mDPR)
Multilingual Question Answering
Methods
Question-Generation based Re-ranking (QGPR)
Machine-Translation-based Data Augmentation
Experimental Details
Question-Generation based Re-ranking
Machine-Translation-based Data Augmentation
...and 6 more sections

Figures (1)

Figure 1: Block Diagram of our model architecture.

Multilingual Open QA on the MIA Shared Task

TL;DR

Abstract

Multilingual Open QA on the MIA Shared Task

Authors

TL;DR

Abstract

Table of Contents

Figures (1)