Table of Contents
Fetching ...

Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance

Sunkyoung Kim, Dayeon Ki, Yireun Kim, Jinsik Lee

TL;DR

This work tackles the challenge of cross-lingual transfer in multilingual LLMs where translating entire in-context content is costly and can disrupt context. It introduces Cross-lingual QA prompting, which keeps passages in the source language while translating only the question-answer pairs to the target language, reducing translation burden. Across four typologically diverse benchmarks (XNLI, XCOPA, MLQA, XQuAD), this prompting strategy matches the performance of fully translated prompts and exhibits stronger gains as model size increases. The findings highlight a scalable, cost-efficient pathway to improve cross-lingual in-context learning in open-source multilingual models, with potential extensions to English-centric LLMs.

Abstract

Multilingual large language models (MLLMs) have demonstrated significant cross-lingual capabilities through in-context learning. Existing approaches typically construct monolingual in-context examples, either in the source or target language. However, translating entire in-context examples into the target language might compromise contextual integrity and be costly in the case of long-context passages. To address this, we introduce Cross-lingual QA, a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs. Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA prompting effectively stimulates models to elicit their cross-lingual knowledge, outperforming prior monolingual prompting approaches. Furthermore, we show that prompting open-source MLLMs with cross-lingual in-context examples enhances performance as the model scale increases.

Cross-lingual QA: A Key to Unlocking In-context Cross-lingual Performance

TL;DR

This work tackles the challenge of cross-lingual transfer in multilingual LLMs where translating entire in-context content is costly and can disrupt context. It introduces Cross-lingual QA prompting, which keeps passages in the source language while translating only the question-answer pairs to the target language, reducing translation burden. Across four typologically diverse benchmarks (XNLI, XCOPA, MLQA, XQuAD), this prompting strategy matches the performance of fully translated prompts and exhibits stronger gains as model size increases. The findings highlight a scalable, cost-efficient pathway to improve cross-lingual in-context learning in open-source multilingual models, with potential extensions to English-centric LLMs.

Abstract

Multilingual large language models (MLLMs) have demonstrated significant cross-lingual capabilities through in-context learning. Existing approaches typically construct monolingual in-context examples, either in the source or target language. However, translating entire in-context examples into the target language might compromise contextual integrity and be costly in the case of long-context passages. To address this, we introduce Cross-lingual QA, a cross-lingual prompting method that translates only the question and answer parts, thus reducing translation costs. Experiments on four typologically diverse multilingual benchmarks show that Cross-lingual QA prompting effectively stimulates models to elicit their cross-lingual knowledge, outperforming prior monolingual prompting approaches. Furthermore, we show that prompting open-source MLLMs with cross-lingual in-context examples enhances performance as the model scale increases.
Paper Structure (19 sections, 2 figures, 4 tables)

This paper contains 19 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of the Cross-lingual QA prompting method. In each in-context example, the passage remains in the source language while the question and answer are translated into the target language. Gray text in brackets represents English translations. The test example is always presented in the target language. We concatenate $k$ in-context examples and the test example to form an in-context example, which is then fed into a multilingual LLM.
  • Figure 2: Average XQuAD performance for each prompting method at model scale. Model size on the $x$-axis is the approximated values for each model size variant. Left: XGLM, Right: BLOOM.