P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark
Tao Sun, Enhao Pan, Zhengkai Yang, Kaixin Sui, Jiajun Shi, Xianfu Cheng, Tongliang Li, Wenhao Huang, Ge Zhang, Jian Yang, Zhoujun Li
TL;DR
P2P tackles the problem of automated academic poster generation by introducing a flexible, three‑agent framework that renders HTML posters directly from research papers. The Figure, Section, and Orchestrate agents operate with dedicated checkers and reflection loops to ensure visual accuracy, content fidelity, and cohesive layout, with $P_{ ext{poster ext{_}text}}$ generated from a structural schema $S$ and visual descriptions $\uF_d$. The authors contribute the P2Pinstruct dataset (over $3.0 imes 10^4$ instruction–response pairs) and the P2Peval benchmark (121 paper–poster pairs) with Universal and Fine‑Grained evaluation using LLMs as judges and human checklists. Empirical results show that P2P, especially when augmented with reasoning capabilities and P2Pinstruct fine‑tuning (e.g., Qwen3‑P2P‑8B), achieves high lexical and fidelity scores, often matching or surpassing human posters in preference studies. This work establishes a practical, evaluable foundation for automated paper‑to‑poster systems and enables robust future research in automated scientific communication.
Abstract
Academic posters are vital for scholarly communication, yet their manual creation is time-consuming. However, automated academic poster generation faces significant challenges in preserving intricate scientific details and achieving effective visual-textual integration. Existing approaches often struggle with semantic richness and structural nuances, and lack standardized benchmarks for evaluating generated academic posters comprehensively. To address these limitations, we introduce P2P, the first flexible, LLM-based multi-agent framework that generates high-quality, HTML-rendered academic posters directly from research papers, demonstrating strong potential for practical applications. P2P employs three specialized agents-for visual element processing, content generation, and final poster assembly-each integrated with dedicated checker modules to enable iterative refinement and ensure output quality. To foster advancements and rigorous evaluation in this domain, we construct and release P2PInstruct, the first large-scale instruction dataset comprising over 30,000 high-quality examples tailored for the academic paper-to-poster generation task. Furthermore, we establish P2PEval, a comprehensive benchmark featuring 121 paper-poster pairs and a dual evaluation methodology (Universal and Fine-Grained) that leverages LLM-as-a-Judge and detailed, human-annotated checklists. Our contributions aim to streamline research dissemination and provide the community with robust tools for developing and evaluating next-generation poster generation systems.
