Discourse Graph Guided Document Translation with Large Language Models

Viet-Thanh Pham; Minghan Wang; Hao-Han Liao; Thuy-Trang Vu

Discourse Graph Guided Document Translation with Large Language Models

Viet-Thanh Pham, Minghan Wang, Hao-Han Liao, Thuy-Trang Vu

TL;DR

TransGraph introduces a discourse-graph guided approach to document-level machine translation. By partitioning text into coherent chunks and constructing a labeled discourse graph, it selectively conditions each chunk's translation on a small, graph-neighbourhood context rather than the full document, achieving robust improvements in d-BLEU, d-COMET, and terminology accuracy while reducing token overhead. Across three benchmarks and multiple LLM backbones, TransGraph outperforms sentence-level, single-pass, and agent-based baselines, with strong ablations confirming the value of coherent chunking, explicit discourse relations, and graph structure. The method demonstrates backbone-agnostic efficiency and cross-lingual robustness, highlighting structured discourse retrieval as a practical lever for high-quality DocMT.

Abstract

Adapting large language models to full document translation remains challenging due to the difficulty of capturing long-range dependencies and preserving discourse coherence throughout extended texts. While recent agentic machine translation systems mitigate context window constraints through multi-agent orchestration and persistent memory, they require substantial computational resources and are sensitive to memory retrieval strategies. We introduce TransGraph, a discourse-guided framework that explicitly models inter-chunk relationships through structured discourse graphs and selectively conditions each translation segment on relevant graph neighbourhoods rather than relying on sequential or exhaustive context. Across three document-level MT benchmarks spanning six languages and diverse domains, TransGraph consistently surpasses strong baselines in translation quality and terminology consistency while incurring significantly lower token overhead.

Discourse Graph Guided Document Translation with Large Language Models

TL;DR

Abstract

Discourse Graph Guided Document Translation with Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)