Table of Contents
Fetching ...

Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation

Suhang Wu, Jialong Tang, Baosong Yang, Ante Wang, Kaidi Jia, Jiawei Yu, Junfeng Yao, Jinsong Su

TL;DR

Experimental results reveal linguistic inequalities and offer advice for improving multilingual Retrieval Augmented Generation and cross-lingual knowledge transfer as well as underscore the complexities inherent in multilingual RALMs.

Abstract

RALMs (Retrieval-Augmented Language Models) broaden their knowledge scope by incorporating external textual resources. However, the multilingual nature of global knowledge necessitates RALMs to handle diverse languages, a topic that has received limited research focus. In this work, we propose \textit{Futurepedia}, a carefully crafted benchmark containing parallel texts across eight representative languages. We evaluate six multilingual RALMs using our benchmark to explore the challenges of multilingual RALMs. Experimental results reveal linguistic inequalities: 1) high-resource languages stand out in Monolingual Knowledge Extraction; 2) Indo-European languages lead RALMs to provide answers directly from documents, alleviating the challenge of expressing answers across languages; 3) English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection. Based on these findings, we offer advice for improving multilingual Retrieval Augmented Generation. For monolingual knowledge extraction, careful attention must be paid to cascading errors from translating low-resource languages into high-resource ones. In cross-lingual knowledge transfer, encouraging RALMs to provide answers within documents in different languages can improve transfer performance. For multilingual knowledge selection, incorporating more non-English documents and repositioning English documents can help mitigate RALMs' selection bias. Through comprehensive experiments, we underscore the complexities inherent in multilingual RALMs and offer valuable insights for future research.

Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation

TL;DR

Experimental results reveal linguistic inequalities and offer advice for improving multilingual Retrieval Augmented Generation and cross-lingual knowledge transfer as well as underscore the complexities inherent in multilingual RALMs.

Abstract

RALMs (Retrieval-Augmented Language Models) broaden their knowledge scope by incorporating external textual resources. However, the multilingual nature of global knowledge necessitates RALMs to handle diverse languages, a topic that has received limited research focus. In this work, we propose \textit{Futurepedia}, a carefully crafted benchmark containing parallel texts across eight representative languages. We evaluate six multilingual RALMs using our benchmark to explore the challenges of multilingual RALMs. Experimental results reveal linguistic inequalities: 1) high-resource languages stand out in Monolingual Knowledge Extraction; 2) Indo-European languages lead RALMs to provide answers directly from documents, alleviating the challenge of expressing answers across languages; 3) English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection. Based on these findings, we offer advice for improving multilingual Retrieval Augmented Generation. For monolingual knowledge extraction, careful attention must be paid to cascading errors from translating low-resource languages into high-resource ones. In cross-lingual knowledge transfer, encouraging RALMs to provide answers within documents in different languages can improve transfer performance. For multilingual knowledge selection, incorporating more non-English documents and repositioning English documents can help mitigate RALMs' selection bias. Through comprehensive experiments, we underscore the complexities inherent in multilingual RALMs and offer valuable insights for future research.

Paper Structure

This paper contains 27 sections, 2 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The refinement process on the collected data.
  • Figure 2: Three evaluation tasks in our benchmark: (a) Monolingual knowledge extraction, which requires RALMs to extract knowledge from documents and resolve questions within the same language; (b) Cross-lingual knowledge transfer, which challenges RALMs to handle documents and QA pairs in different languages; (c) Multilingual knowledge selection, which presents documents in various languages that containing different answers, allowing for the evaluation of RALMs' selection bias. Note that we use three of the eight languages: English (en), Chinese (zh), and French (fr) to illustrate these tasks, and we provide the English translations in parentheses.
  • Figure 3: Performance of RALMs in monolingual knowledge extraction. Note that Chinese and English are relatively high-resource languages, while Arabic is a relatively low-resource language.
  • Figure 4: The performance of RALMs on cross-lingual knowledge transfer. The x-axis represents the document language, and the y-axis represents the query language. The first/second values represent the RALM performance in the Flexible/Strict Language Setting. The colors indicate the performance of the strict language setting, with deeper blues representing stronger performance. Note that results for other RALMs can be found in Appendix A.
  • Figure 5: The performance of RALMs on the task of multilingual knowledge selection. The x-axis represents the and y-axis represent the answer and query languages, respectively. Note that results of other RALMs can be found in Appendix A.
  • ...and 8 more figures