Table of Contents
Fetching ...

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

Corinne Aars, Lauren Adams, Xiaokan Tian, Zhaoyu Wang, Colton Wismer, Jason Wu, Pablo Rivas, Korn Sooksatra, Matthew Fendt

TL;DR

The paper addresses the need for faster translation of sacred texts into underrepresented languages by fine-tuning ByT5 on the Johns Hopkins Bible Corpus. It presents a methodology using byte-level, multilingual Bible data to produce verse translations and reports a BLEU score of 0.27, highlighting the approach's potential to broaden accessibility while acknowledging limitations of BLEU as a sole metric. Key contributions include demonstrating ByT5's applicability to biblical language, discussing parameter tuning, and outlining avenues to improve translation quality and scalability for diverse languages. The work underscores the practical impact of NLP tooling in promoting linguistic diversity and cultural preservation in sacred literature.

Abstract

This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.

Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages

TL;DR

The paper addresses the need for faster translation of sacred texts into underrepresented languages by fine-tuning ByT5 on the Johns Hopkins Bible Corpus. It presents a methodology using byte-level, multilingual Bible data to produce verse translations and reports a BLEU score of 0.27, highlighting the approach's potential to broaden accessibility while acknowledging limitations of BLEU as a sole metric. Key contributions include demonstrating ByT5's applicability to biblical language, discussing parameter tuning, and outlining avenues to improve translation quality and scalability for diverse languages. The work underscores the practical impact of NLP tooling in promoting linguistic diversity and cultural preservation in sacred literature.

Abstract

This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.
Paper Structure (10 sections, 2 tables)