Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Hongyuan Lu; Haoran Yang; Haoyang Huang; Dongdong Zhang; Wai Lam; Furu Wei

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Hongyuan Lu, Haoran Yang, Haoyang Huang, Dongdong Zhang, Wai Lam, Furu Wei

TL;DR

<p>We address the challenge of translating rare words in low-resource languages using large language models. The paper introduces Chain-of-Dictionary Prompting (CoD), which appends chains of multilingual dictionary translations to the translation prompt to provide the model with lexical priors in a zero-shot setting. Extensive experiments on FLORES-200 show large gains for many language directions, with CoD outperforming few-shot demonstrations and, in some cases, approaching or surpassing state-of-the-art translators. The approach is validated across multiple models, languages, and directions, and ablations confirm the importance of chaining and auxiliary languages, while practical considerations like stopword truncation reduce compute. This work offers a viable, data-efficient method to improve MNMT in real-world, low-resource scenarios.</p>

Abstract

Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data. Yet, despite the fact that the amount of training data is gigantic, they still struggle with translating rare words, particularly for low-resource languages. Even worse, it is usually unrealistic to retrieve relevant demonstrations for in-context learning with low-resource languages on LLMs, which restricts the practical use of LLMs for translation -- how should we mitigate this problem? To this end, we present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities for LLMs. Extensive experiments indicate that augmenting ChatGPT with CoD elicits large gains by up to 13x chrF++ points for MNMT (3.08 to 42.63 for English to Serbian written in Cyrillic script) on FLORES-200 full devtest set. We further demonstrate the importance of chaining the multilingual dictionaries, as well as the superiority of CoD to few-shot demonstration for low-resource languages.

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

TL;DR

Abstract

Paper Structure (36 sections, 5 figures, 14 tables)

This paper contains 36 sections, 5 figures, 14 tables.

Introduction
Chain-of-Dictionary Prompting for Neural Machine Translation
Multilingual Dictionary
Experimental Setup
Baselines
Datasets and Evaluation Metrics
Dictionaries
Polysemy
Prompting Design
Results and Analysis
En-X Results
En-X: ChatGPT
En-X: Languages Improved on ChatGPT
En-X: Languages Not Improved on ChatGPT
En-X: Languages Selection
...and 21 more sections

Figures (5)

Figure 1: An illustration for CoD for English to Tamil translation. CoD consists of two sections: the standard translation prompt (the upper box) and the chained multilingual dictionaries. We highlight by languages the chained dictionary part for CoD, containing the words and their translations in different languages. CoD outperforms standard prompting in this example, and other methods such as the conventional Chain-of-Thought have been shown as less effective for MT 2023arXiv230313780P. We bold the text for the actual inputs/outputs. Other non-bolded texts are placed for the explanation to the readers.
Figure 2: An illustrated comparison of 200 languages from English into the languages between the baseline ChatGPT (GPT-3.5-TURBO) and CoD. We sorted the language scores in chrF++ for ChatGPT in descending order, and we split the whole figure into two parts for clarity. We present the first half in the upper figure, and we present the second half in the bottom figure. CoD is effective for many languages, especially for low-resource ones.
Figure 3: A case study on translating from English into Kikongo with Latin script using GPT-4 throughout the cases. We evaluate the results on BLEU and chrF++. We highlight in green the words translated wrong by baselines but translated correctly by CoD, even if the words are not presented in the multilingual dictionary chains.
Figure 4: A case study on translating from English into Central Kurdish with Latin script using GPT-4 throughout the cases. We evaluate the results on BLEU and chrF++. We highlight in green the words translated wrong by baselines but translated correctly by CoD, even if the words are not presented in the multilingual dictionary chains.
Figure 5: A case study on translating from English into Central Kurdish with Latin script using GPT-3.5 throughout the cases. We evaluate the results on BLEU and chrF++. We highlight in green the words translated wrong by baselines but translated correctly by CoD, even if the words are not presented in the multilingual dictionary chains.

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

TL;DR

Abstract

Chain-of-Dictionary Prompting Elicits Translation in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)