Table of Contents
Fetching ...

Machine Translation Testing via Syntactic Tree Pruning

Quanjun Zhang, Juan Zhai, Chunrong Fang, Jiawei Liu, Weisong Sun, Haichuan Hu, Qingyu Wang

TL;DR

This paper presents STP (syntactic tree pruning), a metamorphic testing framework for machine translation that generates core-semantics-preserving pruned sentences from the syntactic dependency tree and compares translations of original and pruned inputs using a bag-of-words distance. By exploiting cross-structure perturbations, STP uncovers a broad spectrum of translation errors beyond what word-replacement methods capture, reporting thousands of erroneous translations with competitive precision on Google Translate and Bing Translator. Across a large, real-world dataset, STP achieves a recall of about $74\%$ for original sentence errors and significantly outperforms state-of-the-art baselines in both the number of errors detected and the precision of reporting. The approach is efficient, language-agnostic to a degree, and provides actionable guidance on threshold selection, with availability of artifacts to support replication and further research.

Abstract

Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.

Machine Translation Testing via Syntactic Tree Pruning

TL;DR

This paper presents STP (syntactic tree pruning), a metamorphic testing framework for machine translation that generates core-semantics-preserving pruned sentences from the syntactic dependency tree and compares translations of original and pruned inputs using a bag-of-words distance. By exploiting cross-structure perturbations, STP uncovers a broad spectrum of translation errors beyond what word-replacement methods capture, reporting thousands of erroneous translations with competitive precision on Google Translate and Bing Translator. Across a large, real-world dataset, STP achieves a recall of about for original sentence errors and significantly outperforms state-of-the-art baselines in both the number of errors detected and the precision of reporting. The approach is efficient, language-agnostic to a degree, and provides actionable guidance on threshold selection, with availability of artifacts to support replication and further research.

Abstract

Machine translation systems have been widely adopted in our daily life, making life easier and more convenient. Unfortunately, erroneous translations may result in severe consequences, such as financial losses. This requires to improve the accuracy and the reliability of machine translation systems. However, it is challenging to test machine translation systems because of the complexity and intractability of the underlying neural models. To tackle these challenges, we propose a novel metamorphic testing approach by syntactic tree pruning (STP) to validate machine translation systems. Our key insight is that a pruned sentence should have similar crucial semantics compared with the original sentence. Specifically, STP (1) proposes a core semantics-preserving pruning strategy by basic sentence structure and dependency relations on the level of syntactic tree representation; (2) generates source sentence pairs based on the metamorphic relation; (3) reports suspicious issues whose translations break the consistency property by a bag-of-words model. We further evaluate STP on two state-of-the-art machine translation systems (i.e., Google Translate and Bing Microsoft Translator) with 1,200 source sentences as inputs. The results show that STP can accurately find 5,073 unique erroneous translations in Google Translate and 5,100 unique erroneous translations in Bing Microsoft Translator (400% more than state-of-the-art techniques), with 64.5% and 65.4% precision, respectively. The reported erroneous translations vary in types and more than 90% of them cannot be found by state-of-the-art techniques. There are 9,393 erroneous translations unique to STP, which is 711.9% more than state-of-the-art techniques. Moreover, STP is quite effective to detect translation errors for the original sentences with a recall reaching 74.0%, improving state-of-the-art techniques by 55.1% on average.
Paper Structure (46 sections, 3 equations, 6 figures, 13 tables, 1 algorithm)

This paper contains 46 sections, 3 equations, 6 figures, 13 tables, 1 algorithm.

Figures (6)

  • Figure 1: Syntactic tree representation
  • Figure 2: Overview of STP
  • Figure 3: The precision of STP (# of erroneous issues/# of suspicious issues) under different threshold values
  • Figure 4: Erroneous translations by different approaches
  • Figure 5: The trade-off between the precision and the number of erroneous issues.
  • ...and 1 more figures

Theorems & Definitions (1)

  • definition 1