Evaluating Optimal Reference Translations

Vilém Zouhar; Věra Kloudová; Martin Popel; Ondřej Bojar

Evaluating Optimal Reference Translations

Vilém Zouhar, Věra Kloudová, Martin Popel, Ondřej Bojar

TL;DR

This paper proposes a methodology for creating more reliable document-level human reference translations, called “optimal reference translations,” with the simple aim to raise the bar of what should be deemed “human translation quality", and evaluates the obtained document-level optimal reference translations in comparison with "standard" ones.

Abstract

The overall translation quality reached by current machine translation (MT) systems for high-resourced language pairs is remarkably good. Standard methods of evaluation are not suitable nor intended to uncover the many translation errors and quality deficiencies that still persist. Furthermore, the quality of standard reference translations is commonly questioned and comparable quality levels have been reached by MT alone in several language pairs. Navigating further research in these high-resource settings is thus difficult. In this article, we propose a methodology for creating more reliable document-level human reference translations, called "optimal reference translations," with the simple aim to raise the bar of what should be deemed "human translation quality." We evaluate the obtained document-level optimal reference translations in comparison with "standard" ones, confirming a significant quality increase and also documenting the relationship between evaluation and translation editing.

Evaluating Optimal Reference Translations

TL;DR

Abstract

Paper Structure (34 sections, 11 figures)

This paper contains 34 sections, 11 figures.

Introduction
Related Work
Optimal Reference Translations
Translation Creation
Annotation Campaign
Annotators
Data
Annotation Interface
Annotation Instructions
Quantitative Analysis
Annotator Questionnaire
Collected Annotations
Quality of Initial Translations
Inter-Annotator Agreement
Modelling Overall Quality from Components
...and 19 more sections

Figures (11)

Figure 1: Example translations of the same source into Czech. Literal transcriptions of the translations are shown in italics. N1: translatologist collaboration (optimal translation), P1: professional translation agency (post-edited MT), P2, P3: professional translation agency.
Figure 2: First 5 rows of a screen for a single document with source and 4 translations in paralel. Screens were accessed by annotators in an online spreadsheet program. Note: Scalable graphics -- zoom in.
Figure 3: Distribution densities of ratings of each collected variable (thin tail cropped $\geq$ 3 for higher resolution of high-density values). Numbers and horizontal lines show feature means.
Figure 4: Pearson correlations between individual features on document- (top-left) and segment- (bottom-right) level.
Figure 5: Averages of ratings for different translation sources on document- (top-left) and segment- (bottom-right) level across features.
...and 6 more figures

Evaluating Optimal Reference Translations

TL;DR

Abstract

Evaluating Optimal Reference Translations

Authors

TL;DR

Abstract

Table of Contents

Figures (11)