Rule-Based, Neural and LLM Back-Translation: Comparative Insights from a Variant of Ladin
Samuel Frontull, Georg Moser
TL;DR
Facing scarce parallel data for Val Badia Ladin, the paper evaluates back-translation strategies using a fine-tuned NMT BT, a rule-based MT BT, and an LLM-based BT. The study trains multiple BT-enabled systems from monolingual Ladin and Italian data and reports BLEU and chrF++ alongside perplexity, finding that all BT approaches yield similar translation quality in this low-resource scenario. Round-trip translations reveal distinct model behaviors, with RBMT providing the most stable back-translation and LLMs delivering fluency but variable fidelity. The work highlights the complementary strengths of these paradigms and provides public release of resources to foster future Ladin MT research.
Abstract
This paper explores the impact of different back-translation approaches on machine translation for Ladin, specifically the Val Badia variant. Given the limited amount of parallel data available for this language (only 18k Ladin-Italian sentence pairs), we investigate the performance of a multilingual neural machine translation model fine-tuned for Ladin-Italian. In addition to the available authentic data, we synthesise further translations by using three different models: a fine-tuned neural model, a rule-based system developed specifically for this language pair, and a large language model. Our experiments show that all approaches achieve comparable translation quality in this low-resource scenario, yet round-trip translations highlight differences in model performance.
