Paper2SysArch: Structure-Constrained System Architecture Generation from Scientific Papers
Ziyi Guo, Zhou Liu, Wentao Zhang
TL;DR
This work tackles the lack of standardized evaluation for generating system-architecture diagrams from scientific papers by proposing the Paper2SysArch Benchmark, a large-scale dataset of 3,000 paper–diagram pairs with a three-tier semantic, layout, and visual evaluation framework. It also introduces Paper2SysArch, an end-to-end multi-agent system that converts papers into editable, structured diagrams using a hierarchical three-layer graph representation and a distributed generation pipeline. The benchmark emphasizes structure-centric semantics via a machine-readable GraphJSON ground truth, enabling reproducible and fair comparisons across methods. Findings show strong visual and layout performance from the agent-based approach, with semantic fidelity remaining the main challenge, highlighting a promising direction for controllable, automated scientific visualization and future improvements in semantic reconstruction and layout flexibility.
Abstract
The manual creation of system architecture diagrams for scientific papers is a time-consuming and subjective process, while existing generative models lack the necessary structural control and semantic understanding for this task. A primary obstacle hindering research and development in this domain has been the profound lack of a standardized benchmark to quantitatively evaluate the automated generation of diagrams from text. To address this critical gap, we introduce a novel and comprehensive benchmark, the first of its kind, designed to catalyze progress in automated scientific visualization. It consists of 3,000 research papers paired with their corresponding high-quality ground-truth diagrams and is accompanied by a three-tiered evaluation metric assessing semantic accuracy, layout coherence, and visual quality. Furthermore, to establish a strong baseline on this new benchmark, we propose Paper2SysArch, an end-to-end system that leverages multi-agent collaboration to convert papers into structured, editable diagrams. To validate its performance on complex cases, the system was evaluated on a manually curated and more challenging subset of these papers, where it achieves a composite score of 69.0. This work's principal contribution is the establishment of a large-scale, foundational benchmark to enable reproducible research and fair comparison. Meanwhile, our proposed system serves as a viable proof-of-concept, demonstrating a promising path forward for this complex task.
