Table of Contents
Fetching ...

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing

TL;DR

This paper addresses the challenge of enabling LLMs to perform well in very low-resource languages by exploiting their English-dominant abilities through Linguistically-Diverse Prompting (LDP). LDP constructs in-context prompts from a diverse set of high-resource languages to drive cross-lingual translation (X→E, E→X, X→Y) and intra-lingual tasks, using synthetic exemplars and back-translation, with optional unsupervised fine-tuning. Across translations, non-English translations, zero-shot summarization, QA, and instruction-following, LDP achieves performance on par with or surpassing supervised prompting and conventional English pivoting, and even enables smaller models to approach the capabilities of much larger ones. The work demonstrates practical, data-efficient improvements for 34 low-resource languages and highlights ablations that guide effective prompt design and language-tag usage, while acknowledging limitations and ethical considerations.

Abstract

Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.

Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

TL;DR

This paper addresses the challenge of enabling LLMs to perform well in very low-resource languages by exploiting their English-dominant abilities through Linguistically-Diverse Prompting (LDP). LDP constructs in-context prompts from a diverse set of high-resource languages to drive cross-lingual translation (X→E, E→X, X→Y) and intra-lingual tasks, using synthetic exemplars and back-translation, with optional unsupervised fine-tuning. Across translations, non-English translations, zero-shot summarization, QA, and instruction-following, LDP achieves performance on par with or surpassing supervised prompting and conventional English pivoting, and even enables smaller models to approach the capabilities of much larger ones. The work demonstrates practical, data-efficient improvements for 34 low-resource languages and highlights ablations that guide effective prompt design and language-tag usage, while acknowledging limitations and ethical considerations.

Abstract

Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented languages fall behind due to pre-training data imbalance. To elicit LLMs' ability onto low-resource languages without any supervised data, we propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English. These prompts are then used to create intra-lingual exemplars to perform tasks in the target languages. Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages. We also show that fine-tuning a 7B model on data generated from our method helps it perform competitively with a 175B model. In non-English translation tasks, our method even outperforms supervised prompting by up to 3 chrF++ in many low-resource languages. When evaluated on zero-shot multilingual summarization, our method surpasses other English-pivoting baselines by up to 4 ROUGE-L and is also favored by GPT-4.
Paper Structure (36 sections, 5 equations, 8 figures, 11 tables)

This paper contains 36 sections, 5 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: LDP prompting for unsupervised translation. (\ref{['fig:ldp_example:x_en']}) $\mathcal{F}_{\rightarrow en}$ translates from any language into English by concatenating the fixed linguistically-diverse shots and input text to prompt LLMs to generate the correct translation. (\ref{['fig:ldp_example:en_x']}) Similarly $\mathcal{F}_{\rightarrow ig}$ translates English into Igbo, but with low accuracy. (\ref{['fig:ldp_example:en_x_BT']}) $\mathcal{F}^{bt}_{\rightarrow ig}$ translates English to Igbo using synthetic intra-lingual exemplars generated from unlabeled target-language data with $\mathcal{F}_{\rightarrow en}$.
  • Figure 2: Illustrations LDP for $X$$\rightarrow$En, En$\rightarrow$$X$ and $X$$\rightarrow$$Y$ cross-lingual translation (\ref{['fig:ldp_mt']}) and general intra-lingual tasks (\ref{['fig:ldp_sum']}). For $X$$\rightarrow$En, the colored box [z] represents an unlabeled text in language z, [en] represents its corresponding En translation, while [x] stands for the test input in language x and uncolored box [$\hat{\text{en}}$] represents model outputs. For En$\rightarrow$$X$, [en$^{\text{x}}$] represents En text translated with $\mathcal{F}^{mt}_{\text{x}\rightarrow \text{en}}$. For $X$$\rightarrow$$Y$, [$\overline{\text{y}}^{\text{en}}$] represents a text in language y translated from En text [$\hat{\text{en}}^{\text{x}}$]. Similarly for intra-lingual tasks like summarization (\ref{['fig:ldp_sum']}), [$\hat{\text{r}}_\text{z}$] represents a response in language z for query [q$_\text{z}$].
  • Figure 3: Probabilities of whether the BLOOM model generates the right language for En$\rightarrow$$X$ task using LDP without (\ref{['fig:gen_right_language:without_bt']}) and with (\ref{['fig:gen_right_language:with_bt']}) intra-lingual BT prompts. Columns indicate the languages the model generates into while rows are the languages it is supposed to generate. ## are other languages.
  • Figure 4: Gains achieved by fine-tuning BLOOM-7B w.r.t numbers of trainable parameters.
  • Figure 5: Low-resource language coverage % of the ROOTS corpus roots_corpus used to train BLOOM. The highest-resource language for Indic and African are Hindi and Swahili. Hindi accounts for $0.7$% and the rarest language, Tumbuka, takes up only $2e^{-5}$% of the corpus.
  • ...and 3 more figures