Table of Contents
Fetching ...

LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination

Ziming Zhu, Chenglong Wang, Shunjie Xing, Yifu Huo, Fengning Tian, Quan Du, Di Yang, Chunliang Zhang, Tong Xiao, Jingbo Zhu

TL;DR

LaTeXTrans addresses the challenge of translating LaTeX-formatted documents by translating at the source level with a collaborative multi-agent pipeline that preserves structure and semantics. It deploys three modules (Parser, Translation, Generator) and six specialized agents to decompose, translate, validate, summarize, and maintain terminology, then reconstructs the target-language LaTeX and compiles it to PDF. Experimental results on arXiv-derived LaTeX content show LaTeXTrans outperforms traditional MT and single-agent baselines in both translation quality and format fidelity, with notable gains in FC-score and COMETKiwi, especially when using GPT-4o as the backbone. The work provides a practical, open-source solution for high-fidelity LaTeX document translation and highlights the benefits of structured, context-aware, and terminology-consistent translation in formatted texts.

Abstract

Despite the remarkable progress of modern machine translation (MT) systems on general-domain texts, translating structured LaTeX-formatted documents remains a significant challenge. These documents typically interleave natural language with domain-specific syntax, such as mathematical equations, tables, figures, and cross-references, all of which must be accurately preserved to maintain semantic integrity and compilability. In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge. LaTeXTrans ensures format preservation, structural fidelity, and terminology consistency through six specialized agents: 1) a Parser that decomposes LaTeX into translation-friendly units via placeholder substitution and syntax filtering; 2) a Translator, Validator, Summarizer, and Terminology Extractor that work collaboratively to ensure context-aware, self-correcting, and terminology-consistent translations; 3) a Generator that reconstructs the translated content into well-structured LaTeX documents. Experimental results demonstrate that LaTeXTrans can outperform mainstream MT systems in both translation accuracy and structural fidelity, offering an effective and practical solution for translating LaTeX-formatted documents.The code of LaTeXTrans is available at https://github.com/NiuTrans/LaTeXTrans.

LaTeXTrans: Structured LaTeX Translation with Multi-Agent Coordination

TL;DR

LaTeXTrans addresses the challenge of translating LaTeX-formatted documents by translating at the source level with a collaborative multi-agent pipeline that preserves structure and semantics. It deploys three modules (Parser, Translation, Generator) and six specialized agents to decompose, translate, validate, summarize, and maintain terminology, then reconstructs the target-language LaTeX and compiles it to PDF. Experimental results on arXiv-derived LaTeX content show LaTeXTrans outperforms traditional MT and single-agent baselines in both translation quality and format fidelity, with notable gains in FC-score and COMETKiwi, especially when using GPT-4o as the backbone. The work provides a practical, open-source solution for high-fidelity LaTeX document translation and highlights the benefits of structured, context-aware, and terminology-consistent translation in formatted texts.

Abstract

Despite the remarkable progress of modern machine translation (MT) systems on general-domain texts, translating structured LaTeX-formatted documents remains a significant challenge. These documents typically interleave natural language with domain-specific syntax, such as mathematical equations, tables, figures, and cross-references, all of which must be accurately preserved to maintain semantic integrity and compilability. In this paper, we introduce LaTeXTrans, a collaborative multi-agent system designed to address this challenge. LaTeXTrans ensures format preservation, structural fidelity, and terminology consistency through six specialized agents: 1) a Parser that decomposes LaTeX into translation-friendly units via placeholder substitution and syntax filtering; 2) a Translator, Validator, Summarizer, and Terminology Extractor that work collaboratively to ensure context-aware, self-correcting, and terminology-consistent translations; 3) a Generator that reconstructs the translated content into well-structured LaTeX documents. Experimental results demonstrate that LaTeXTrans can outperform mainstream MT systems in both translation accuracy and structural fidelity, offering an effective and practical solution for translating LaTeX-formatted documents.The code of LaTeXTrans is available at https://github.com/NiuTrans/LaTeXTrans.

Paper Structure

This paper contains 31 sections, 1 equation, 17 figures, 2 tables.

Figures (17)

  • Figure 1: The architecture of our LaTeXTrans system.
  • Figure 2: The pipeline of our placeholder substitution strategy. The mapping files are the mapping of placeholders and the replaced content, and they are also translation units of different granularities.
  • Figure 3: Comparison of translation quality in two representative cases between the baseline and LaTeXTrans. In the LaTeX source, blue text marks labels that should be preserved. A red question mark ("?") indicates label loss during translation. Red highlights inconsistent translations, green indicates consistent ones, and orange shows LaTeX labels missed by the baseline but successfully preserved by LaTeXTrans.
  • Figure 4: Distribution of paper lengths (in word count) in our test set.
  • Figure 5: Word cloud visualization of topics covered in our test set.
  • ...and 12 more figures