Table of Contents
Fetching ...

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

Raphaël Merx, Hanna Suominen, Trevor Cohn, Ekaterina Vylomova

TL;DR

This work fills a critical gap in health MT evaluation by introducing OpenWHO, a document-level parallel corpus drawn from WHO e-learning content, spanning 2,978 documents and 26,824 sentences across 20+ languages including nine low-resource languages. It benchmarks modern large language models against traditional MT baselines, showing that, in health translation for low-resource languages, document-level context-enabled LLMs (notably Gemini 2.5 Flash) outperform NMT baselines, with gains up to +4.79 ChrF points. The study also demonstrates that the benefits of document-level translation are domain- and model-dependent, offering stronger improvements in specialised domains like health and literature than in general news, and highlights the trade-offs in error types between LLMs and NMT. By releasing OpenWHO under a CC BY-NC 4.0 license, the authors provide a valuable benchmark to drive research on low-resource health MT and context-aware translation strategies, with implications for more effective dissemination of health information globally.

Abstract

In machine translation (MT), health is a high-stakes domain characterised by widespread deployment and domain-specific vocabulary. However, there is a lack of MT evaluation datasets for low-resource languages in this domain. To address this gap, we introduce OpenWHO, a document-level parallel corpus of 2,978 documents and 26,824 sentences from the World Health Organization's e-learning platform. Sourced from expert-authored, professionally translated materials shielded from web-crawling, OpenWHO spans a diverse range of over 20 languages, of which nine are low-resource. Leveraging this new resource, we evaluate modern large language models (LLMs) against traditional MT models. Our findings reveal that LLMs consistently outperform traditional MT models, with Gemini 2.5 Flash achieving a +4.79 ChrF point improvement over NLLB-54B on our low-resource test set. Further, we investigate how LLM context utilisation affects accuracy, finding that the benefits of document-level translation are most pronounced in specialised domains like health. We release the OpenWHO corpus to encourage further research into low-resource MT in the health domain.

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

TL;DR

This work fills a critical gap in health MT evaluation by introducing OpenWHO, a document-level parallel corpus drawn from WHO e-learning content, spanning 2,978 documents and 26,824 sentences across 20+ languages including nine low-resource languages. It benchmarks modern large language models against traditional MT baselines, showing that, in health translation for low-resource languages, document-level context-enabled LLMs (notably Gemini 2.5 Flash) outperform NMT baselines, with gains up to +4.79 ChrF points. The study also demonstrates that the benefits of document-level translation are domain- and model-dependent, offering stronger improvements in specialised domains like health and literature than in general news, and highlights the trade-offs in error types between LLMs and NMT. By releasing OpenWHO under a CC BY-NC 4.0 license, the authors provide a valuable benchmark to drive research on low-resource health MT and context-aware translation strategies, with implications for more effective dissemination of health information globally.

Abstract

In machine translation (MT), health is a high-stakes domain characterised by widespread deployment and domain-specific vocabulary. However, there is a lack of MT evaluation datasets for low-resource languages in this domain. To address this gap, we introduce OpenWHO, a document-level parallel corpus of 2,978 documents and 26,824 sentences from the World Health Organization's e-learning platform. Sourced from expert-authored, professionally translated materials shielded from web-crawling, OpenWHO spans a diverse range of over 20 languages, of which nine are low-resource. Leveraging this new resource, we evaluate modern large language models (LLMs) against traditional MT models. Our findings reveal that LLMs consistently outperform traditional MT models, with Gemini 2.5 Flash achieving a +4.79 ChrF point improvement over NLLB-54B on our low-resource test set. Further, we investigate how LLM context utilisation affects accuracy, finding that the benefits of document-level translation are most pronounced in specialised domains like health. We release the OpenWHO corpus to encourage further research into low-resource MT in the health domain.

Paper Structure

This paper contains 53 sections, 2 figures, 12 tables.

Figures (2)

  • Figure 1: Overview of the OpenWHO parallel dataset, highlighting its depth across low-resource languages and scripts.
  • Figure 2: Number of parallel sentences per language in the OpenWHO dataset. The English source has 50,898 sentences. Low-resource languages covered in our experiments (Section \ref{['sec:experiments']}) are in bold.