Table of Contents
Fetching ...

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

Zhuoyuan Mao, Yen Yu

TL;DR

The paper tackles MT for unseen, low-resource languages by introducing AlignInstruct, a cross-lingual, word-alignment–based discriminator that provides supervision during MT fine-tuning. A baseline MTInstruct approach demonstrates that translation capabilities can be induced in unseen languages, and AlignInstruct consistently enhances performance across 48 directions and improves zero-shot translation in many cases, with discriminative instructions outperforming generative variants. Model size amplifies gains, and AlignInstruct reshapes early-layer cross-lingual representations in a manner consistent with multilingual encoder–decoder MT. The work offers a practical, data-efficient path to broaden MT coverage for low-resource languages and points to future work incorporating monolingual corpora and larger models to further improve cross-lingual translation capabilities.

Abstract

This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.

Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages

TL;DR

The paper tackles MT for unseen, low-resource languages by introducing AlignInstruct, a cross-lingual, word-alignment–based discriminator that provides supervision during MT fine-tuning. A baseline MTInstruct approach demonstrates that translation capabilities can be induced in unseen languages, and AlignInstruct consistently enhances performance across 48 directions and improves zero-shot translation in many cases, with discriminative instructions outperforming generative variants. Model size amplifies gains, and AlignInstruct reshapes early-layer cross-lingual representations in a manner consistent with multilingual encoder–decoder MT. The work offers a practical, data-efficient path to broaden MT coverage for low-resource languages and points to future work incorporating monolingual corpora and larger models to further improve cross-lingual translation capabilities.

Abstract

This article introduces contrastive alignment instructions (AlignInstruct) to address two challenges in machine translation (MT) on large language models (LLMs). One is the expansion of supported languages to previously unseen ones. The second relates to the lack of data in low-resource languages. Model fine-tuning through MT instructions (MTInstruct) is a straightforward approach to the first challenge. However, MTInstruct is limited by weak cross-lingual signals inherent in the second challenge. AlignInstruct emphasizes cross-lingual supervision via a cross-lingual discriminator built using statistical word alignments. Our results based on fine-tuning the BLOOMZ models (1b1, 3b, and 7b1) in up to 24 unseen languages showed that: (1) LLMs can effectively translate unseen languages using MTInstruct; (2) AlignInstruct led to consistent improvements in translation quality across 48 translation directions involving English; (3) Discriminator-based instructions outperformed their generative counterparts as cross-lingual instructions; (4) AlignInstruct improved performance in 30 zero-shot directions.
Paper Structure (26 sections, 5 figures, 19 tables)

This paper contains 26 sections, 5 figures, 19 tables.

Figures (5)

  • Figure 1: Average chrF++ scores of BLOOMZ models across 24 unseen languages, comparing settings of without fine-tuning, fine-tuning with MTInstruct, and fine-tuning that combines MTInstruct and AlignInstruct.
  • Figure 2: Proposed instruction tuning methods combining MTInstruct (Sec. \ref{['sec:2.1']}) and AlignInstruct (Sec. \ref{['sec:2.2']}) for LLMs in MT tasks.$\oplus$ denotes combining multiple instruction patters with a specific fine-tuning curriculum (Sec. \ref{['sec:3.2']}). IBM Model 2 indicates word alignment model of statistical machine translation brown-etal-1993-mathematics.
  • Figure 3: Differences in cosine similarity of layer-wise embeddings for BLOOMZ+24.$\Delta$1 represents the changes from the unmodified BLOOMZ to the one on MTInstruct, and $\Delta$2 from MTInstruct to MT+Align.
  • Figure 4: Differences in cosine similarity of layer-wise embeddings for BLOOMZ+3.$\Delta$1 represents the changes from the unmodified BLOOMZ to the one on MTInstruct, and $\Delta$2 from MTInstruct to MT+Align.
  • Figure 5: Examples of HintInstruct and ReviseInstruct.