Table of Contents
Fetching ...

Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations

Yuri Balashov, Alex Balashov, Shiho Fukuda Koski

TL;DR

This paper introduces Translation Analytics for freelancers, focusing on adapting traditional automatic evaluation metrics (BLEU, chrF, TER) and neural metrics (COMET) to small-scale, real-world translation work. It presents the Christopher & Dana Reeve Foundation Trilingual Corpus as a practical testbed and demonstrates both automatic and manual evaluation approaches, including correlations between automatic scores and human judgments. Key findings show that small, strategically chosen samples can yield reliable system comparisons and that COMET often aligns with human judgments, though correlations vary by language pair and sample size. The study advocates for proactive, skill-enhancing adoption of AI tools by freelancers and outlines concrete future directions to broaden scope and impact.

Abstract

This is the first in a series of papers exploring the rapidly expanding new opportunities arising from recent progress in language technologies for individual translators and language service providers with modest resources. The advent of advanced neural machine translation systems, large language models, and their integration into workflows via computer-assisted translation tools and translation management systems have reshaped the translation landscape. These advancements enable not only translation but also quality evaluation, error spotting, glossary generation, and adaptation to domain-specific needs, creating new technical opportunities for freelancers. In this series, we aim to empower translators with actionable methods to harness these advancements. Our approach emphasizes Translation Analytics, a suite of evaluation techniques traditionally reserved for large-scale industry applications but now becoming increasingly available for smaller-scale users. This first paper introduces a practical framework for adapting automatic evaluation metrics -- such as BLEU, chrF, TER, and COMET -- to freelancers' needs. We illustrate the potential of these metrics using a trilingual corpus derived from a real-world project in the medical domain and provide statistical analysis correlating human evaluations with automatic scores. Our findings emphasize the importance of proactive engagement with emerging technologies to not only adapt but thrive in the evolving professional environment.

Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations

TL;DR

This paper introduces Translation Analytics for freelancers, focusing on adapting traditional automatic evaluation metrics (BLEU, chrF, TER) and neural metrics (COMET) to small-scale, real-world translation work. It presents the Christopher & Dana Reeve Foundation Trilingual Corpus as a practical testbed and demonstrates both automatic and manual evaluation approaches, including correlations between automatic scores and human judgments. Key findings show that small, strategically chosen samples can yield reliable system comparisons and that COMET often aligns with human judgments, though correlations vary by language pair and sample size. The study advocates for proactive, skill-enhancing adoption of AI tools by freelancers and outlines concrete future directions to broaden scope and impact.

Abstract

This is the first in a series of papers exploring the rapidly expanding new opportunities arising from recent progress in language technologies for individual translators and language service providers with modest resources. The advent of advanced neural machine translation systems, large language models, and their integration into workflows via computer-assisted translation tools and translation management systems have reshaped the translation landscape. These advancements enable not only translation but also quality evaluation, error spotting, glossary generation, and adaptation to domain-specific needs, creating new technical opportunities for freelancers. In this series, we aim to empower translators with actionable methods to harness these advancements. Our approach emphasizes Translation Analytics, a suite of evaluation techniques traditionally reserved for large-scale industry applications but now becoming increasingly available for smaller-scale users. This first paper introduces a practical framework for adapting automatic evaluation metrics -- such as BLEU, chrF, TER, and COMET -- to freelancers' needs. We illustrate the potential of these metrics using a trilingual corpus derived from a real-world project in the medical domain and provide statistical analysis correlating human evaluations with automatic scores. Our findings emphasize the importance of proactive engagement with emerging technologies to not only adapt but thrive in the evolving professional environment.

Paper Structure

This paper contains 32 sections, 1 equation, 4 figures, 19 tables.

Figures (4)

  • Figure 1: Visualization of MATEO-generated metric scores for EN-RU and EN-JA translations, broken down by MT engine and LLM, for 1-8_en_short.
  • Figure 2: Automatic evaluation scores across the outputs of six translation systems for three non-overlapping parts of the RFTC corpus: 229, 1143, and 2183 segments.
  • Figure 3: Carbon imprint for 13th Gen Intel(R) Core(TM) i9-13900KF 3.00 GHz 64.0 GB PC.
  • Figure 4: Carbon imprint for virtual server with Intel Xeon (Skylake) 6-core CPU, 16 GB of RAM.