LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages
Jared Coleman, Bhaskar Krishnamachari, Khalil Iskarous, Ruben Rosales
TL;DR
The paper tackles machine translation for no-resource languages by proposing LLM-Assisted Rule-Based MT (LLM-RBMT), enabling translation without parallel corpora. It applies the paradigm to Owens Valley Paiute using a rule-based sentence builder and LLM-driven OVP translation steps, both OVP→English and English→OVP, with a teaching-revitalization focus. Evaluation relies on semantic similarity metrics due to lack of bilingual data, and shows strong results for constrained vocabulary translations and reveals limitations when vocabulary is incomplete. The work contributes a practical, extensible toolchain for endangered-language revitalization and provides a framework for extending LLM-assisted RBMT to other no-resource languages.
Abstract
We propose a new paradigm for machine translation that is particularly useful for no-resource languages (those without any publicly available bilingual or monolingual corpora): LLM-RBMT (LLM-Assisted Rule Based Machine Translation). Using the LLM-RBMT paradigm, we design the first language education/revitalization-oriented machine translator for Owens Valley Paiute (OVP), a critically endangered Indigenous American language for which there is virtually no publicly available data. We present a detailed evaluation of the translator's components: a rule-based sentence builder, an OVP to English translator, and an English to OVP translator. We also discuss the potential of the paradigm, its limitations, and the many avenues for future research that it opens up.
