Table of Contents
Fetching ...

Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost

Mihai Nadas, Laura Diosan, Andreea Tomescu, Andrei Piscoran

TL;DR

The paper tackles low-resource English–Romanian literary translation by constructing a cost-aware, end-to-end TF2 pipeline that combines synthetic data with parameter-efficient fine-tuning of open models. It introduces DS-TF2-EN-RO-15K and DS-TF2-EN-RO-3M, and demonstrates that fine-tuned open backbones (TF2-1B/4B/12B with LoRA) can narrow the gap to proprietary systems while dramatically reducing costs (≈$350 vs. $13{,}500–$270{,}000 for API-based translation across 3M fables). An evaluation framework blends corpus BLEU with a five-dimension LLM-based rubric (accuracy, fluency, coherence, style, cultural adaptation) and anchor human judgments, revealing strong rubric-based gains for TF2 models and confirming robustness to judge family. The work releases all datasets, prompts, and scripts to enable reproducible, low-cost literary MT research and suggests future extensions to other languages and genres. Overall, TF2 demonstrates that cost-efficient, domain-adaptive open models can achieve near-parity with large proprietary models for narrative translation while supporting on-device deployment and cultural preservation of literature.

Abstract

Literary translation has recently gained attention as a distinct and complex task in machine translation research. However, the translation by small open models remains an open problem. We contribute to this ongoing research by introducing TinyFabulist Translation Framework (TF2), a unified framework for dataset creation, fine-tuning, and evaluation in English->Romanian literary translation, centered on the creation and open release of both a compact, fine-tuned language model (TF2-12B) and large-scale synthetic parallel datasets (DS-TF2-EN-RO-3M and DS-TF2-EN-RO-15K). Building on DS-TF1-EN-3M (TF1), the largest collection of synthetic English fables to date, we address the need for rich, high-quality literary datasets in low-resource languages such as Romanian. Our pipeline first generates 15k high-quality Romanian reference translations from the TF1 pool using a high-performing LLM. We then apply a two-stage fine-tuning process to a 12B-parameter open-weight model: (i) instruction tuning to capture genre-specific narrative style, and (ii) adapter compression for efficient deployment. Evaluation combines corpus-level BLEU with a five-dimension LLM-based rubric (accuracy, fluency, coherence, style, and cultural adaptation) to provide a nuanced assessment of translation quality. Results show that our fine-tuned model achieves strong fluency and adequacy, narrowing the gap to top-performing proprietary models under automated and human-anchored evaluation, while being open, accessible, and significantly more cost-effective. Alongside the fine-tuned model and both datasets, we publicly release all scripts and evaluation prompts. TF2 thus provides an end-to-end, reproducible pipeline for research on cost-efficient translation, cross-lingual narrative generation, and the broad adoption of open models for culturally significant literary content in low-resource settings.

Small Open Models Achieve Near Parity with Large Models in Low Resource Literary Translation at a Fraction of the Cost

TL;DR

The paper tackles low-resource English–Romanian literary translation by constructing a cost-aware, end-to-end TF2 pipeline that combines synthetic data with parameter-efficient fine-tuning of open models. It introduces DS-TF2-EN-RO-15K and DS-TF2-EN-RO-3M, and demonstrates that fine-tuned open backbones (TF2-1B/4B/12B with LoRA) can narrow the gap to proprietary systems while dramatically reducing costs (≈13{,}500–$270{,}000 for API-based translation across 3M fables). An evaluation framework blends corpus BLEU with a five-dimension LLM-based rubric (accuracy, fluency, coherence, style, cultural adaptation) and anchor human judgments, revealing strong rubric-based gains for TF2 models and confirming robustness to judge family. The work releases all datasets, prompts, and scripts to enable reproducible, low-cost literary MT research and suggests future extensions to other languages and genres. Overall, TF2 demonstrates that cost-efficient, domain-adaptive open models can achieve near-parity with large proprietary models for narrative translation while supporting on-device deployment and cultural preservation of literature.

Abstract

Literary translation has recently gained attention as a distinct and complex task in machine translation research. However, the translation by small open models remains an open problem. We contribute to this ongoing research by introducing TinyFabulist Translation Framework (TF2), a unified framework for dataset creation, fine-tuning, and evaluation in English->Romanian literary translation, centered on the creation and open release of both a compact, fine-tuned language model (TF2-12B) and large-scale synthetic parallel datasets (DS-TF2-EN-RO-3M and DS-TF2-EN-RO-15K). Building on DS-TF1-EN-3M (TF1), the largest collection of synthetic English fables to date, we address the need for rich, high-quality literary datasets in low-resource languages such as Romanian. Our pipeline first generates 15k high-quality Romanian reference translations from the TF1 pool using a high-performing LLM. We then apply a two-stage fine-tuning process to a 12B-parameter open-weight model: (i) instruction tuning to capture genre-specific narrative style, and (ii) adapter compression for efficient deployment. Evaluation combines corpus-level BLEU with a five-dimension LLM-based rubric (accuracy, fluency, coherence, style, and cultural adaptation) to provide a nuanced assessment of translation quality. Results show that our fine-tuned model achieves strong fluency and adequacy, narrowing the gap to top-performing proprietary models under automated and human-anchored evaluation, while being open, accessible, and significantly more cost-effective. Alongside the fine-tuned model and both datasets, we publicly release all scripts and evaluation prompts. TF2 thus provides an end-to-end, reproducible pipeline for research on cost-efficient translation, cross-lingual narrative generation, and the broad adoption of open models for culturally significant literary content in low-resource settings.

Paper Structure

This paper contains 40 sections, 3 equations, 1 figure, 9 tables.

Figures (1)

  • Figure 1: TinyFabulist Translation Framework (TF2) pipeline: evaluation of translation models on literary benchmarks (S1); creation of a 15k English--Romanian parallel corpus via the top-ranked system (S2); parameter-efficient fine-tuning of open LLMs and quantized variants (S3); and large-scale translation of the full English corpus (S4).