Table of Contents
Fetching ...

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

Olivia Macmillan-Scott, Roksana Goworek, Eda B. Özyiğit

TL;DR

This paper analyzes generative query expansion for cross-lingual information retrieval using multilingual LLMs, comparing Aya Expanse 8B and Gemma 3 (4B/12B) under four prompting strategies, two translation/expansion orders, and two retrieval formulations across CLIRMatrix and mMARCO. It reveals that query length dictates the most effective prompting technique, with zero-shot prompting best for short queries and few-shot prompting advantaging longer queries, while pre-translation expansion and concatenation generally improve recall. Fine-tuning on CLIR-specific data yields gains for CLIRMatrix but can hurt performance on mMARCO, underscoring dataset- and style-dependence of fine-tuning benefits. The study also uncovers substantial cross-language disparities, with larger relative gains for languages with weaker baselines and non-Latin scripts, highlighting the need for balanced multilingual resources and careful evaluation. Overall, generative cross-lingual QE is promising for CLIR, but its effectiveness is contingent on model choice, prompt design, and language characteristics.

Abstract

Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure that relevant documents are not missed. Recently, multilingual large language models (mLLMs) have shifted query expansion from semantic augmentation with synonyms and related words to pseudo-document generation. Pseudo-documents both introduce additional relevant terms and bridge the gap between short queries and long documents, which is particularly beneficial in dense retrieval. This study evaluates recent mLLMs and fine-tuned variants across several generative expansion strategies to identify factors that drive cross-lingual retrieval performance. Results show that query length largely determines which prompting technique is effective, and that more elaborate prompts often do not yield further gains. Substantial linguistic disparities persist: cross-lingual query expansion can produce the largest improvements for languages with the weakest baselines, yet retrieval is especially poor between languages written in different scripts. Fine-tuning is found to lead to performance gains only when the training and test data are of similar format. These outcomes underline the need for more balanced multilingual and cross-lingual training and evaluation resources.

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

TL;DR

This paper analyzes generative query expansion for cross-lingual information retrieval using multilingual LLMs, comparing Aya Expanse 8B and Gemma 3 (4B/12B) under four prompting strategies, two translation/expansion orders, and two retrieval formulations across CLIRMatrix and mMARCO. It reveals that query length dictates the most effective prompting technique, with zero-shot prompting best for short queries and few-shot prompting advantaging longer queries, while pre-translation expansion and concatenation generally improve recall. Fine-tuning on CLIR-specific data yields gains for CLIRMatrix but can hurt performance on mMARCO, underscoring dataset- and style-dependence of fine-tuning benefits. The study also uncovers substantial cross-language disparities, with larger relative gains for languages with weaker baselines and non-Latin scripts, highlighting the need for balanced multilingual resources and careful evaluation. Overall, generative cross-lingual QE is promising for CLIR, but its effectiveness is contingent on model choice, prompt design, and language characteristics.

Abstract

Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure that relevant documents are not missed. Recently, multilingual large language models (mLLMs) have shifted query expansion from semantic augmentation with synonyms and related words to pseudo-document generation. Pseudo-documents both introduce additional relevant terms and bridge the gap between short queries and long documents, which is particularly beneficial in dense retrieval. This study evaluates recent mLLMs and fine-tuned variants across several generative expansion strategies to identify factors that drive cross-lingual retrieval performance. Results show that query length largely determines which prompting technique is effective, and that more elaborate prompts often do not yield further gains. Substantial linguistic disparities persist: cross-lingual query expansion can produce the largest improvements for languages with the weakest baselines, yet retrieval is especially poor between languages written in different scripts. Fine-tuning is found to lead to performance gains only when the training and test data are of similar format. These outcomes underline the need for more balanced multilingual and cross-lingual training and evaluation resources.

Paper Structure

This paper contains 33 sections, 9 figures, 14 tables.

Figures (9)

  • Figure 1: Overview of an information retrieval pipeline. Query expansion (along with question answering) is one of the stages that has most benefitted from recent developments of transformer-based models
  • Figure 2: Zero-shot query expansion example for mMARCO dataset bonifacio_2022_mmarco.
  • Figure 3: Retrieval performance on CLIRMatrix across languages using cross-lingual query expansion, measured by Hit@10.
  • Figure 4: Retrieval performance on mMARCO across languages using cross-lingual query expansion, measured by Hit@10.
  • Figure 5: Cross-lingual query expansion results with Aya Expanse 8B for CLIRMatrix, change from original baseline query.
  • ...and 4 more figures