PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG
Tao Yu, Minghui Zhang, Zhiqing Cui, Hao Wang, Zhongtian Luo, Shenghua Chai, Junhao Gong, Yuzhao Peng, Yuxuan Zhou, Yujia Yang, Zhenghao Zhang, Haopeng Jin, Xinming Wang, Yufei Xiong, Jiabing Yang, Jiahao Yuan, Hanqing Wang, Hongzhu Yi, YiFan Zhang, Yan Huang, Liang Wang
TL;DR
PaperX tackles the fragmentation of automated scientific presentation generation by introducing Scholar DAG, a structured intermediate representation that separates semantic content from modality-specific rendering. The two-stage pipeline (Paper2DAG and DAG2Scholar) converts a paper into a hierarchical, cross-modal graph and then renders PPTs, posters, and PR content via modality-specific graph traversal and refinement. The framework achieves state-of-the-art content fidelity, aesthetics, and efficiency across PPTEval, Paper2Poster, and PRBench, and demonstrates extensibility through integration with external rendering systems. This approach promises scalable, coherent, and cost-effective dissemination of scientific knowledge across diverse formats.
Abstract
Transforming scientific papers into multimodal presentation content is essential for research dissemination but remains labor intensive. Existing automated solutions typically treat each format as an isolated downstream task, leading to redundant processing and semantic inconsistency. We introduce PaperX, a unified framework that models academic presentation generation as a structural transformation and rendering process. Central to our approach is the Scholar DAG, an intermediate representation that decouples the paper's logical structure from its final presentation syntax. By applying adaptive graph traversal strategies, PaperX generates diverse, high quality outputs from a single source. Comprehensive evaluations demonstrate that our framework achieves the state of the art performance in content fidelity and aesthetic quality while significantly improving cost efficiency compared to specialized single task agents.
