Who Has the Final Word? Designing Multi-Agent Collaborative Framework for Professional Translators
George X. Wang, Jiaqian Hu, Jing Qian
TL;DR
This work tackles the limitation of automated translation in high-stakes domains by introducing CHORUS, a human–AI, multi-agent framework that operationalizes the Multidimensional Quality Metrics (MQM) as a set of specialized agents coordinated by a Dimension Router and grounded in a shared Translation Memory. The translator retains final authority, using two explicit interaction loops to guide iterative refinements across focused quality dimensions such as accuracy, terminology, and style. An expert formative study motivates the design, and an extensive evaluation across language pairs and LLM backbones shows CHORUS yields meaningful, statistically significant improvements in semantic fidelity and contextual adaptation, particularly for linguistically distant pairs. The results support the practicality of interpretable, human-guided AI systems for professional translation and point to future human-in-the-loop studies to quantify workflow impact and broaden applicability to other expert domains.
Abstract
Recent advances in LLM based translation have led to renewed interest in fully automated systems, yet professional translators remain essential in high stakes domains where decisions about accuracy, terminology, style, and audience cannot be safely automated. Current tools are typically single shot generators or single-agent self-refiners, offering limited support for translator multidimensional decision making process and providing little structured leverage for translator input. We present CHORUS, a human-AI multiagent collaborative translation framework grounded in the Multidimensional Quality Metrics (MQM) framework, which decomposes quality dimensions into specialized agents and integrates their feedback into an iterative refinement loop controlled by the translator. A six-user preliminary study with professional translators found that CHORUS consistently outperforms zero-shot and single-agent baselines, showing that MQM-aligned multi-agent collaboration better supports professional translation workflows than autonomous generation.
