Table of Contents
Fetching ...

Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

Somnath Kumar, Vaibhav Balloli, Mercy Ranjit, Kabir Ahuja, Sunayana Sitaram, Kalika Bali, Tanuja Ganu, Akshay Nambi

TL;DR

This work tackles multilingual QA by introducing a dynamic runtime framework that selects, per query, the optimal combination of prompt strategy, embedding model, and LLM. It pairs LLM generation with multilingual embeddings in a Retrieval-Augmented Generation setup and employs a lightweight Conv-ND head to predict per-configuration $F1$, enabling offline training and online adaptation without extensive fine-tuning. Through extensive evaluation on IndicQA and TyDiQA across 18 languages, the approach yields 10-15% improvements over pre-trained baselines and up to 4x gains over language-specific fine-tuned models, while maintaining adaptability to unseen languages and datasets. The paper also introduces GPTAnnotator to enrich ground-truth evaluation, proposes a Similar Language Algorithm to guide prompt choices, and provides detailed implementation and training instructions for deploying dynamic multilingual configurations in practice.

Abstract

Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages. This paper addresses the critical challenge of improving multilingual performance without extensive fine-tuning. We introduce a novel dynamic learning approach that optimizes prompt strategy, embedding model, and LLM per query at runtime. By adapting configurations dynamically, our method achieves significant improvements over static, best and random baselines. It operates efficiently in both offline and online settings, generalizing seamlessly across new languages and datasets. Leveraging Retrieval-Augmented Generation (RAG) with state-of-the-art multilingual embeddings, we achieve superior task performance across diverse linguistic contexts. Through systematic investigation and evaluation across 18 diverse languages using popular question-answering (QA) datasets we show our approach results in 10-15% improvements in multilingual performance over pre-trained models and 4x gains compared to fine-tuned, language-specific models.

Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs

TL;DR

This work tackles multilingual QA by introducing a dynamic runtime framework that selects, per query, the optimal combination of prompt strategy, embedding model, and LLM. It pairs LLM generation with multilingual embeddings in a Retrieval-Augmented Generation setup and employs a lightweight Conv-ND head to predict per-configuration , enabling offline training and online adaptation without extensive fine-tuning. Through extensive evaluation on IndicQA and TyDiQA across 18 languages, the approach yields 10-15% improvements over pre-trained baselines and up to 4x gains over language-specific fine-tuned models, while maintaining adaptability to unseen languages and datasets. The paper also introduces GPTAnnotator to enrich ground-truth evaluation, proposes a Similar Language Algorithm to guide prompt choices, and provides detailed implementation and training instructions for deploying dynamic multilingual configurations in practice.

Abstract

Large language models (LLMs) have revolutionized various domains but still struggle with non-Latin scripts and low-resource languages. This paper addresses the critical challenge of improving multilingual performance without extensive fine-tuning. We introduce a novel dynamic learning approach that optimizes prompt strategy, embedding model, and LLM per query at runtime. By adapting configurations dynamically, our method achieves significant improvements over static, best and random baselines. It operates efficiently in both offline and online settings, generalizing seamlessly across new languages and datasets. Leveraging Retrieval-Augmented Generation (RAG) with state-of-the-art multilingual embeddings, we achieve superior task performance across diverse linguistic contexts. Through systematic investigation and evaluation across 18 diverse languages using popular question-answering (QA) datasets we show our approach results in 10-15% improvements in multilingual performance over pre-trained models and 4x gains compared to fine-tuned, language-specific models.
Paper Structure (20 sections, 2 equations, 3 figures, 15 tables, 4 algorithms)