Preserving Multilingual Quality While Tuning Query Encoder on English Only
Oleg Vasilyev, Randy Sawaya, John Bohannon
TL;DR
The paper addresses whether fine-tuning a high-quality multilingual query encoder on English-only data degrades cross-language retrieval performance. Using the E5 multilingual encoder, the authors tune only the query component on MSMARCO while keeping document embeddings fixed, observing preservation or even improvement of multilingual alignment as well as robustness across English-only and cross-lingual datasets. They propose adiabatic tuning, where very small learning rates help retain non-targeted pretrained properties, with $2\times 10^{-8}$ to $6\times 10^{-8}$ identified as particularly effective for E5; they further show that freezing the output.dense.weight can extend this safe regime to around $1.3\times 10^{-7}$. The work demonstrates a resource-efficient pathway for domain- or query-type adaptation in multilingual retrieval, highlighting a general principle that careful, slow tuning can preserve broad system properties beyond the tuning objective.
Abstract
A query encoder of a dual passage retrieval system can be tuned for specific types of queries or domains, while the precomputed and stored documents representations are kept intact. Switching from one query encoder to another when needed is easily feasible, unlike overhauling the embeddings of a whole knowledge base. In this work we raise a question: Can the generic, original qualities of the encoder be preserved or at least left not too degraded when it is tuned on a narrow domain? We conducted experiments on a high quality multilingual embedding model: Tuning it on a single English-only dataset, we observe that the tuning not only preserves the multilingual qualities, but even improves them. The embedding qualities on distinctly different data are also improved or at least preserved. Drawing on our observations, we suggest a more general hypothesis: Tuning with intentionally low learning rate can preserve or improve a system's properties acquired in training, but not specifically targeted by tuning. We call this adiabatic tuning and provide tentative explanations.
