Machine Translation Testing via Syntactic Tree Pruning

Quanjun Zhang; Juan Zhai; Chunrong Fang; Jiawei Liu; Weisong Sun; Haichuan Hu; Qingyu Wang

Machine Translation Testing via Syntactic Tree Pruning

Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun, Haichuan Hu, Qingyu Wang

TL;DR

This paper presents STP (syntactic tree pruning), a metamorphic testing framework for machine translation that generates core-semantics-preserving pruned sentences from the syntactic dependency tree and compares translations of original and pruned inputs using a bag-of-words distance. By exploiting cross-structure perturbations, STP uncovers a broad spectrum of translation errors beyond what word-replacement methods capture, reporting thousands of erroneous translations with competitive precision on Google Translate and Bing Translator. Across a large, real-world dataset, STP achieves a recall of about $74\%$ for original sentence errors and significantly outperforms state-of-the-art baselines in both the number of errors detected and the precision of reporting. The approach is efficient, language-agnostic to a degree, and provides actionable guidance on threshold selection, with availability of artifacts to support replication and further research.

Abstract

Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.

Machine Translation Testing via Syntactic Tree Pruning

TL;DR

for original sentence errors and significantly outperforms state-of-the-art baselines in both the number of errors detected and the precision of reporting. The approach is efficient, language-agnostic to a degree, and provides actionable guidance on threshold selection, with availability of artifacts to support replication and further research.

Abstract

Paper Structure (46 sections, 3 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 46 sections, 3 equations, 6 figures, 13 tables, 1 algorithm.

Introduction
Background & Motivation
Basic Sentence Structure
Syntactic Tree Representation
A Motivating Example
Approach and Implementation
Core Semantics-preserving Pruned Sentences Generation
Metamorphism-based Source Pair Generation
Consistency-based Translation Error Detection
Experimental Setup
Research Questions
Machine Translation Systems
Dataset
Labelling
Comparison
...and 31 more sections

Figures (6)

Figure 1: Syntactic tree representation
Figure 2: Overview of STP
Figure 3: The precision of STP (# of erroneous issues/# of suspicious issues) under different threshold values
Figure 4: Erroneous translations by different approaches
Figure 5: The trade-off between the precision and the number of erroneous issues.
...and 1 more figures

Theorems & Definitions (1)

definition 1

Machine Translation Testing via Syntactic Tree Pruning

TL;DR

Abstract

Machine Translation Testing via Syntactic Tree Pruning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)