Table of Contents
Fetching ...

On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms

Richard Yue, John E. Ortega, Kenneth Ward Church

TL;DR

The paper addresses the persistent problem of acronym mistranslation in machine translation by proposing a terminology-focused workflow that explicitly disambiguates long-form and short-form terms. It introduces a publicly available acronym corpus and a verification step that combines AI-assisted hypothesis generation (SciBERT fine-tuned on term-acronym pairs and AB3P extraction) with a Boolean retrieval-based verification workflow over arXiv and PubMed sources. Empirical results show that adding the verification step yields substantial improvements in acronym agreement and verification compared to standard baselines like Opus and Google Translate, demonstrating practical value for professional translators. The work highlights the importance of domain-specific terminology handling in MT pipelines and provides a concrete, reproducible path for improving acronym accuracy in TT translations, with potential applicability to generative MT systems as well.

Abstract

The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language models in natural language processing (NLP) do - predict the next word in a series of words. While high-resource languages like English and French are reported to achieve near human parity using common metrics for measurement such as BLEU and COMET, we find that an important step is being missed: the translation of technical terms, specifically acronyms. Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms - as much as 50% in our findings. This article addresses acronym disambiguation for MT systems by proposing an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm that achieves nearly 10% increase when compared to Google Translate and OpusMT.

On Translating Technical Terminology: A Translation Workflow for Machine-Translated Acronyms

TL;DR

The paper addresses the persistent problem of acronym mistranslation in machine translation by proposing a terminology-focused workflow that explicitly disambiguates long-form and short-form terms. It introduces a publicly available acronym corpus and a verification step that combines AI-assisted hypothesis generation (SciBERT fine-tuned on term-acronym pairs and AB3P extraction) with a Boolean retrieval-based verification workflow over arXiv and PubMed sources. Empirical results show that adding the verification step yields substantial improvements in acronym agreement and verification compared to standard baselines like Opus and Google Translate, demonstrating practical value for professional translators. The work highlights the importance of domain-specific terminology handling in MT pipelines and provides a concrete, reproducible path for improving acronym accuracy in TT translations, with potential applicability to generative MT systems as well.

Abstract

The typical workflow for a professional translator to translate a document from its source language (SL) to a target language (TL) is not always focused on what many language models in natural language processing (NLP) do - predict the next word in a series of words. While high-resource languages like English and French are reported to achieve near human parity using common metrics for measurement such as BLEU and COMET, we find that an important step is being missed: the translation of technical terms, specifically acronyms. Some state-of-the art machine translation systems like Google Translate which are publicly available can be erroneous when dealing with acronyms - as much as 50% in our findings. This article addresses acronym disambiguation for MT systems by proposing an additional step to the SL-TL (FR-EN) translation workflow where we first offer a new acronym corpus for public consumption and then experiment with a search-based thresholding algorithm that achieves nearly 10% increase when compared to Google Translate and OpusMT.
Paper Structure (12 sections, 7 tables)