Table of Contents
Fetching ...

MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration

David Anugraha, Garry Kuwanto, Lucky Susanto, Derry Tanti Wijaya, Genta Indra Winata

TL;DR

MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting and achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.

Abstract

We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. MetaMetrics-MT enhances existing MT metrics by optimizing their correlation with human judgments. Our experiments on the WMT24 metric shared task dataset demonstrate that MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting. Furthermore, it achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.

MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration

TL;DR

MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting and achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.

Abstract

We present MetaMetrics-MT, an innovative metric designed to evaluate machine translation (MT) tasks by aligning closely with human preferences through Bayesian optimization with Gaussian Processes. MetaMetrics-MT enhances existing MT metrics by optimizing their correlation with human judgments. Our experiments on the WMT24 metric shared task dataset demonstrate that MetaMetrics-MT outperforms all existing baselines, setting a new benchmark for state-of-the-art performance in the reference-based setting. Furthermore, it achieves comparable results to leading metrics in the reference-free setting, offering greater efficiency.

Paper Structure

This paper contains 28 sections, 2 equations, 1 figure, 8 tables.

Figures (1)

  • Figure 1: Heatmaps showing Kendall correlation coefficients between human scores and MT metrics over 3 years of MQM datasets from the WMT shared tasks (2020-2022). Panel (a) displays correlations for the metrics used in $\textcolor{black}{MetaMetrics-MT}$, while panel (b) displays correlations for the metrics used in $\textcolor{black}{MetaMetrics-MT}$-QE.