Table of Contents
Fetching ...

Learning-From-Mistakes Prompting for Indigenous Language Translation

You-Cheng Liao, Chen-Jui Yu, Chi-Yi Lin, He-Feng Yun, Yen-Hsiang Wang, Hsiao-Min Li, Yao-Chung Fan

TL;DR

The paper addresses translating Chinese into extremely low-resource Taiwanese Indigenous languages using large language models with a minimal parallel-corpus datastore and a word-level dictionary. It introduces a three-stage prompting pipeline—KNN-Prompting with Retrieved Prompting Context, Chain-of-Thought prompting, and Learning-from-Mistakes prompting—to progressively improve translations without updating model parameters. Key contributions include the KNN-RPC framework, CoT integration to exploit reasoning over retrieved context, and LFM prompting to incorporate past errors for refinement, validated by automatic metrics and expert reviews across six languages. The work demonstrates that LLMs can function as universal translators for unseen, resource-scarce languages and highlights practical needs such as dictionary expansion and native-speaker evaluation for further gains.

Abstract

Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLMs as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNNPrompting with Retrieved Prompting Context, Chain-of-Thought Prompting and Learningfrom-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs can effectively translate extremely low-resource languages when paired with proper prompting.

Learning-From-Mistakes Prompting for Indigenous Language Translation

TL;DR

The paper addresses translating Chinese into extremely low-resource Taiwanese Indigenous languages using large language models with a minimal parallel-corpus datastore and a word-level dictionary. It introduces a three-stage prompting pipeline—KNN-Prompting with Retrieved Prompting Context, Chain-of-Thought prompting, and Learning-from-Mistakes prompting—to progressively improve translations without updating model parameters. Key contributions include the KNN-RPC framework, CoT integration to exploit reasoning over retrieved context, and LFM prompting to incorporate past errors for refinement, validated by automatic metrics and expert reviews across six languages. The work demonstrates that LLMs can function as universal translators for unseen, resource-scarce languages and highlights practical needs such as dictionary expansion and native-speaker evaluation for further gains.

Abstract

Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLMs as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNNPrompting with Retrieved Prompting Context, Chain-of-Thought Prompting and Learningfrom-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs can effectively translate extremely low-resource languages when paired with proper prompting.
Paper Structure (18 sections, 4 figures, 7 tables)

This paper contains 18 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Methodology Overview
  • Figure 2: KNN-Prompting with RPC
  • Figure 3: CoT KNN-Prompting: In this example, we have two CoT demonstrations. Note that each CoT demonstration comprises (1) A sample sentence, (2) RPC for the sentence, and (3) The ground-truth sentence. These CoT demonstrations are integrated with the KNN-RPC-prompting inputs to serve as comprehensive prompting material for the LLM.
  • Figure 4: LFM Prompting