Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Chengzhi Liu, Yuzhe Yang, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yanan Xie, Peng Qi, Xin Eric Wang
TL;DR
The paper tackles the problem of effectively promoting academic work through automated, narrative-rich, aesthetically aware presentations. It introduces EvoPresent, a self-improvement multi-agent framework guided by PresAesth, a multi-task reinforcement learning model trained with GRPO on limited aesthetic data, to iteratively refine content, design, and delivery. A two-part EvoPresent Benchmark—Presentation Generation Quality and Aesthetic Awareness—enables systematic evaluation across 650 papers and 2,000 slide pairs. Key findings show multi-task GRPO generalizes well to aesthetic tasks, high-quality feedback accelerates self-correction, and there is a trade-off between content construction and visual design; EvoPresent consistently outperforms baselines in both narrative quality and aesthetics, approaching human-designed presentations. Together, these contributions advance automated presentation generation toward scalable, engaging dissemination of research with minimal human intervention, using thresholds $\\alpha=0.5$, $\\zeta=0.25$, and $\\beta=0.001$ in the learning signals.
Abstract
The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: \emph{there is no way to improve it when you cannot evaluate it right}. To address this, we introduce \textbf{EvoPresent}, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is \textbf{PresAesth}, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce \textbf{EvoPresent Benchmark}, a comprehensive benchmark comprising: \textit{Presentation Generation Quality}, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and \textit{Aesthetic Awareness}, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.
