Machine Translation with Large Language Models: Decoder Only vs. Encoder-Decoder
Abhinav P. M., SujayKumar Reddy M, Oswald Christopher
TL;DR
The paper addresses multilingual machine translation for Indian languages by directly comparing decoder-only and encoder-decoder large language models. It employs a structured methodology that includes in-context learning, fine-tuning, and baseline development, using datasets such as BPCC Wiki MT and benchmark metrics like BLEU, chrF, and TER. Key findings indicate that encoder-decoder models deliver reliable translation quality across multilingual tasks, while decoder-only models face convergence and context-handling challenges, with some advantages in fluency and efficiency under certain setups. The work contributes practical insights into architecture trade-offs for multilingual MT and points to Streaming Self-Attention and other techniques as promising directions to enhance performance on long-context translations in real-world multilingual settings.
Abstract
This project, titled "Machine Translation with Large Language Models: Decoder-only vs. Encoder-Decoder," aims to develop a multilingual machine translation (MT) model. Focused on Indian regional languages, especially Telugu, Tamil, and Malayalam, the model seeks to enable accurate and contextually appropriate translations across diverse language pairs. By comparing Decoder-only and Encoder-Decoder architectures, the project aims to optimize translation quality and efficiency, advancing cross-linguistic communication tools.The primary objective is to develop a model capable of delivering high-quality translations that are accurate and contextually appropriate. By leveraging large language models, specifically comparing the effectiveness of Decoder-only and Encoder-Decoder architectures, the project seeks to optimize translation performance and efficiency across multilingual contexts. Through rigorous experimentation and analysis, this project aims to advance the field of machine translation, contributing valuable insights into the effectiveness of different model architectures and paving the way for enhanced cross-linguistic communication tools.
