Preserving Multilingual Quality While Tuning Query Encoder on English Only

Oleg Vasilyev; Randy Sawaya; John Bohannon

Preserving Multilingual Quality While Tuning Query Encoder on English Only

Oleg Vasilyev, Randy Sawaya, John Bohannon

TL;DR

The paper addresses whether fine-tuning a high-quality multilingual query encoder on English-only data degrades cross-language retrieval performance. Using the E5 multilingual encoder, the authors tune only the query component on MSMARCO while keeping document embeddings fixed, observing preservation or even improvement of multilingual alignment as well as robustness across English-only and cross-lingual datasets. They propose adiabatic tuning, where very small learning rates help retain non-targeted pretrained properties, with $2\times 10^{-8}$ to $6\times 10^{-8}$ identified as particularly effective for E5; they further show that freezing the output.dense.weight can extend this safe regime to around $1.3\times 10^{-7}$. The work demonstrates a resource-efficient pathway for domain- or query-type adaptation in multilingual retrieval, highlighting a general principle that careful, slow tuning can preserve broad system properties beyond the tuning objective.

Abstract

A query encoder of a dual passage retrieval system can be tuned for specific types of queries or domains, while the precomputed and stored documents representations are kept intact. Switching from one query encoder to another when needed is easily feasible, unlike overhauling the embeddings of a whole knowledge base. In this work we raise a question: Can the generic, original qualities of the encoder be preserved or at least left not too degraded when it is tuned on a narrow domain? We conducted experiments on a high quality multilingual embedding model: Tuning it on a single English-only dataset, we observe that the tuning not only preserves the multilingual qualities, but even improves them. The embedding qualities on distinctly different data are also improved or at least preserved. Drawing on our observations, we suggest a more general hypothesis: Tuning with intentionally low learning rate can preserve or improve a system's properties acquired in training, but not specifically targeted by tuning. We call this adiabatic tuning and provide tentative explanations.

Preserving Multilingual Quality While Tuning Query Encoder on English Only

TL;DR

identified as particularly effective for E5; they further show that freezing the output.dense.weight can extend this safe regime to around

. The work demonstrates a resource-efficient pathway for domain- or query-type adaptation in multilingual retrieval, highlighting a general principle that careful, slow tuning can preserve broad system properties beyond the tuning objective.

Abstract

Paper Structure (36 sections, 2 equations, 14 figures, 20 tables)

This paper contains 36 sections, 2 equations, 14 figures, 20 tables.

Introduction
Setup
Models
Datasets
Tuning and evaluations
Observations
Tuning partially frozen query model
Learning rate and adiabatic tuning
Extending adiabatic tuning range
Conclusion
Usage of MSMARCO Triplets
ARXIV Dataset for Triplets
Dataset arxiv-negatives
How is it created?
SQUAD
...and 21 more sections

Figures (14)

Figure 1: Improvement of $E5$ on XNLI assessed by cosine. Query is on axis $Y$; text is on $X$.
Figure 2: Evaluations on (a) XNLI and (b) the English-only datasets (MSMARCO and ARXIV) of the E5 query encoder tuned with a frozen embedding block, batch size 14, margin 0.1 using different learning rates. Values that did not pass the two-tailed test are shown with open markers.
Figure 3: Evaluations of the $E5$ query encoder tuned with a frozen embedding block and all layers 'out-put.dense.weight’, with batch size 14, margin 0.1 using different learning rates on (a) XNLI and (b) the English-only datasets (MSMARCO and ARXIV). Values that did not pass the two-tailed test are shown with open markers.
Figure 4: PND of embedding models on XNLI entailment-neutral comparisons assessed by cosine.
Figure 5: PND of embedding models on XNLI entailment-contradiction comparisons assessed by cosine.
...and 9 more figures

Preserving Multilingual Quality While Tuning Query Encoder on English Only

TL;DR

Abstract

Preserving Multilingual Quality While Tuning Query Encoder on English Only

Authors

TL;DR

Abstract

Table of Contents

Figures (14)