Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Dayeon Ki; Kevin Duh; Marine Carpuat

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

Dayeon Ki, Kevin Duh, Marine Carpuat

TL;DR

This study investigates how different forms of quality feedback influence monolingual users' willingness to share machine-translated content. It contrasts explicit feedback (error highlights, LLM explanations) with implicit feedback (backtranslation, QA tables) in a COVID-19 information scenario, using decision accuracy and confidence-weighted accuracy as key metrics. The findings show that implicit QA-table feedback yields the strongest improvements in both accuracy and appropriate reliance, while error highlights underperform. The work highlights the value of feedback that encourages users to judge translations themselves rather than prescribing a course of action, informing the design of MT quality feedback in real-world, user-centric contexts.

Abstract

As people increasingly use AI systems in work and daily life, feedback mechanisms that help them use AI responsibly are urgently needed, particularly in settings where users are not equipped to assess the quality of AI predictions. We study a realistic Machine Translation (MT) scenario where monolingual users decide whether to share an MT output, first without and then with quality feedback. We compare four types of quality feedback: explicit feedback that directly give users an assessment of translation quality using (1) error highlights and (2) LLM explanations, and implicit feedback that helps users compare MT inputs and outputs through (3) backtranslation and (4) question-answer (QA) tables. We find that all feedback types, except error highlights, significantly improve both decision accuracy and appropriate reliance. Notably, implicit feedback, especially QA tables, yields significantly greater gains than explicit feedback in terms of decision accuracy, appropriate reliance, and user perceptions, receiving the highest ratings for helpfulness and trust, and the lowest for mental burden.

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

TL;DR

Abstract

Should I Share this Translation? Evaluating Quality Feedback for User Reliance on Machine Translation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)