(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Minghao Wu; Jiahao Xu; Yulin Yuan; Gholamreza Haffari; Longyue Wang; Weihua Luo; Kaifu Zhang

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Minghao Wu, Jiahao Xu, Yulin Yuan, Gholamreza Haffari, Longyue Wang, Weihua Luo, Kaifu Zhang

TL;DR

This work introduces TransAgents, a novel multi-agent framework that mimics a human translation company to tackle ultra-long literary texts. It partitions the process into preparation and execution, leveraging specialized roles (CEO, Senior/Junior Editor, Translator, Localization Specialist, Proofreader) and two collaborative strategies (Addition-by-Subtraction and Trilateral Collaboration) to produce high-quality translations. To evaluate literary translation beyond surface similarity, the paper proposes Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP), showing that TransAgents can outperform baselines in practical quality (Gemba-DA) and user preference despite lower $d$-BLEU scores, largely due to greater output diversity. The findings highlight the promise of structured, multi-agent collaboration for long-form translation, while also revealing limitations in short-text translation and the importance of memory management, agent profiling, and evaluation design for real-world applicability.

Abstract

Literary translation remains one of the most challenging frontiers in machine translation due to the complexity of capturing figurative language, cultural nuances, and unique stylistic elements. In this work, we introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company, including a CEO, Senior Editor, Junior Editor, Translator, Localization Specialist, and Proofreader. The translation process is divided into two stages: a preparation stage where the team is assembled and comprehensive translation guidelines are drafted, and an execution stage that involves sequential translation, localization, proofreading, and a final quality check. Furthermore, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP), which evaluates translations based solely on target language quality and cultural appropriateness, and Bilingual LLM Preference (BLP), which leverages large language models like GPT-4} for direct text comparison. Although TransAgents achieves lower d-BLEU scores, due to the limited diversity of references, its translations are significantly better than those of other baselines and are preferred by both human evaluators and LLMs over traditional human references and GPT-4} translations. Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

TL;DR

-BLEU scores, largely due to greater output diversity. The findings highlight the promise of structured, multi-agent collaboration for long-form translation, while also revealing limitations in short-text translation and the importance of memory management, agent profiling, and evaluation design for real-world applicability.

Abstract

Paper Structure (51 sections, 11 figures, 12 tables, 2 algorithms)

This paper contains 51 sections, 11 figures, 12 tables, 2 algorithms.

Introduction
Related Work
Machine Translation
Multi-Agent Systems
Ours
TransAgents: A Multi-Agent Company for Literary Translation
Company Organization
Agentization
Translation Workflow
Preparation
Project Member Selection
Addition-by-Subtraction Collaboration
Long-Term Memory Management
Execution
Trilateral Collaboration
...and 36 more sections

Figures (11)

Figure 1: An example profile of Senior Editor.
Figure 2: The workflow of TransAgents.
Figure 3: An example prompt for the Translator, including the translation guidelines, the chapter text in the source language, and the instruction.
Figure 4: The prompt used for bilingual LLM preference evaluation.
Figure 5: Monolingual Human Preference evaluation results. GPT-4 indicates gpt-4-1106-preview.
...and 6 more figures

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

TL;DR

Abstract

(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts

Authors

TL;DR

Abstract

Table of Contents

Figures (11)