Table of Contents
Fetching ...

Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder

Abhinav P. M., SujayKumar Reddy M, Oswald Christopher

TL;DR

The paper addresses multilingual machine translation for Indian languages by directly comparing decoder-only and encoder-decoder large language models. It employs a structured methodology that includes in-context learning, fine-tuning, and baseline development, using datasets such as BPCC Wiki MT and benchmark metrics like BLEU, chrF, and TER. Key findings indicate that encoder-decoder models deliver reliable translation quality across multilingual tasks, while decoder-only models face convergence and context-handling challenges, with some advantages in fluency and efficiency under certain setups. The work contributes practical insights into architecture trade-offs for multilingual MT and points to Streaming Self-Attention and other techniques as promising directions to enhance performance on long-context translations in real-world multilingual settings.

Abstract

This project, titled "Machine Translation with Large Language Models: Decoder-only vs. Encoder-Decoder," aims to develop a multilingual machine translation (MT) model. Focused on Indian regional languages, especially Telugu, Tamil, and Malayalam, the model seeks to enable accurate and contextually appropriate translations across diverse language pairs. By comparing Decoder-only and Encoder-Decoder architectures, the project aims to optimize translation quality and efficiency, advancing cross-linguistic communication tools.The primary objective is to develop a model capable of delivering high-quality translations that are accurate and contextually appropriate. By leveraging large language models, specifically comparing the effectiveness of Decoder-only and Encoder-Decoder architectures, the project seeks to optimize translation performance and efficiency across multilingual contexts. Through rigorous experimentation and analysis, this project aims to advance the field of machine translation, contributing valuable insights into the effectiveness of different model architectures and paving the way for enhanced cross-linguistic communication tools.

Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder

TL;DR

The paper addresses multilingual machine translation for Indian languages by directly comparing decoder-only and encoder-decoder large language models. It employs a structured methodology that includes in-context learning, fine-tuning, and baseline development, using datasets such as BPCC Wiki MT and benchmark metrics like BLEU, chrF, and TER. Key findings indicate that encoder-decoder models deliver reliable translation quality across multilingual tasks, while decoder-only models face convergence and context-handling challenges, with some advantages in fluency and efficiency under certain setups. The work contributes practical insights into architecture trade-offs for multilingual MT and points to Streaming Self-Attention and other techniques as promising directions to enhance performance on long-context translations in real-world multilingual settings.

Abstract

This project, titled "Machine Translation with Large Language Models: Decoder-only vs. Encoder-Decoder," aims to develop a multilingual machine translation (MT) model. Focused on Indian regional languages, especially Telugu, Tamil, and Malayalam, the model seeks to enable accurate and contextually appropriate translations across diverse language pairs. By comparing Decoder-only and Encoder-Decoder architectures, the project aims to optimize translation quality and efficiency, advancing cross-linguistic communication tools.The primary objective is to develop a model capable of delivering high-quality translations that are accurate and contextually appropriate. By leveraging large language models, specifically comparing the effectiveness of Decoder-only and Encoder-Decoder architectures, the project seeks to optimize translation performance and efficiency across multilingual contexts. Through rigorous experimentation and analysis, this project aims to advance the field of machine translation, contributing valuable insights into the effectiveness of different model architectures and paving the way for enhanced cross-linguistic communication tools.
Paper Structure (11 sections, 10 figures, 1 table)

This paper contains 11 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: A sample prompt to the In-Context Learning
  • Figure 2: A sample reference results for the Predicted XGLM and mT5
  • Figure 3: Experimental Results of In-Context Learning - Language Pairs and their BLEU Scores
  • Figure 4: Workflow of mT5 Fine-tuning (Decoder only model)
  • Figure 5: Workflow of Llama2 Fine-tuning (Decoder only model)
  • ...and 5 more figures