Translation Analytics for Freelancers: I. Introduction, Data Preparation, Baseline Evaluations
Yuri Balashov, Alex Balashov, Shiho Fukuda Koski
TL;DR
This paper introduces Translation Analytics for freelancers, focusing on adapting traditional automatic evaluation metrics (BLEU, chrF, TER) and neural metrics (COMET) to small-scale, real-world translation work. It presents the Christopher & Dana Reeve Foundation Trilingual Corpus as a practical testbed and demonstrates both automatic and manual evaluation approaches, including correlations between automatic scores and human judgments. Key findings show that small, strategically chosen samples can yield reliable system comparisons and that COMET often aligns with human judgments, though correlations vary by language pair and sample size. The study advocates for proactive, skill-enhancing adoption of AI tools by freelancers and outlines concrete future directions to broaden scope and impact.
Abstract
This is the first in a series of papers exploring the rapidly expanding new opportunities arising from recent progress in language technologies for individual translators and language service providers with modest resources. The advent of advanced neural machine translation systems, large language models, and their integration into workflows via computer-assisted translation tools and translation management systems have reshaped the translation landscape. These advancements enable not only translation but also quality evaluation, error spotting, glossary generation, and adaptation to domain-specific needs, creating new technical opportunities for freelancers. In this series, we aim to empower translators with actionable methods to harness these advancements. Our approach emphasizes Translation Analytics, a suite of evaluation techniques traditionally reserved for large-scale industry applications but now becoming increasingly available for smaller-scale users. This first paper introduces a practical framework for adapting automatic evaluation metrics -- such as BLEU, chrF, TER, and COMET -- to freelancers' needs. We illustrate the potential of these metrics using a trilingual corpus derived from a real-world project in the medical domain and provide statistical analysis correlating human evaluations with automatic scores. Our findings emphasize the importance of proactive engagement with emerging technologies to not only adapt but thrive in the evolving professional environment.
