Table of Contents
Fetching ...

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

TL;DR

The figurative quality of MT is investigated and a set of human evaluation metrics focused on the translation of figurative language is proposed, observing that translations of figurative expressions display different traits from literal ones.

Abstract

Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones.

MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

TL;DR

The figurative quality of MT is investigated and a set of human evaluation metrics focused on the translation of figurative language is proposed, observing that translations of figurative expressions display different traits from literal ones.

Abstract

Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper, we investigate the figurative quality of MT and propose a set of human evaluation metrics focused on the translation of figurative language. We additionally present a multilingual parallel metaphor corpus generated by post-editing. Our evaluation protocol is designed to estimate four aspects of MT: Metaphorical Equivalence, Emotion, Authenticity, and Quality. In doing so, we observe that translations of figurative expressions display different traits from literal ones.
Paper Structure (28 sections, 8 figures, 6 tables)

This paper contains 28 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Chinese and English metaphorical expressions of being drunk.
  • Figure 2: The dataset creation framework. By translating, annotating, and post-editing, we create a cross-lingual metaphor dataset. Specific details of these sub-steps are elaborated in Sections \ref{['sec:translating']}, \ref{['sec:annotating']}, and \ref{['sec:post-editing']}, respectively.
  • Figure 3: Equivalence distributions of metaphorical and literal expression translations from annotators. non equi, part equi, and full equi refer to non-, part-, and full- equivalence, respectively. mis denotes mistranslation.
  • Figure 4: Pearson correlation heatmap of manual evaluation quality.
  • Figure 5: Emotion-Equivalence correlation heatmap based on co-occurrences.
  • ...and 3 more figures