Table of Contents
Fetching ...

Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

Chengzhi Liu, Yuzhe Yang, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yanan Xie, Peng Qi, Xin Eric Wang

TL;DR

The paper tackles the problem of effectively promoting academic work through automated, narrative-rich, aesthetically aware presentations. It introduces EvoPresent, a self-improvement multi-agent framework guided by PresAesth, a multi-task reinforcement learning model trained with GRPO on limited aesthetic data, to iteratively refine content, design, and delivery. A two-part EvoPresent Benchmark—Presentation Generation Quality and Aesthetic Awareness—enables systematic evaluation across 650 papers and 2,000 slide pairs. Key findings show multi-task GRPO generalizes well to aesthetic tasks, high-quality feedback accelerates self-correction, and there is a trade-off between content construction and visual design; EvoPresent consistently outperforms baselines in both narrative quality and aesthetics, approaching human-designed presentations. Together, these contributions advance automated presentation generation toward scalable, engaging dissemination of research with minimal human intervention, using thresholds $\\alpha=0.5$, $\\zeta=0.25$, and $\\beta=0.001$ in the learning signals.

Abstract

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: \emph{there is no way to improve it when you cannot evaluate it right}. To address this, we introduce \textbf{EvoPresent}, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is \textbf{PresAesth}, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce \textbf{EvoPresent Benchmark}, a comprehensive benchmark comprising: \textit{Presentation Generation Quality}, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and \textit{Aesthetic Awareness}, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.

Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

TL;DR

The paper tackles the problem of effectively promoting academic work through automated, narrative-rich, aesthetically aware presentations. It introduces EvoPresent, a self-improvement multi-agent framework guided by PresAesth, a multi-task reinforcement learning model trained with GRPO on limited aesthetic data, to iteratively refine content, design, and delivery. A two-part EvoPresent Benchmark—Presentation Generation Quality and Aesthetic Awareness—enables systematic evaluation across 650 papers and 2,000 slide pairs. Key findings show multi-task GRPO generalizes well to aesthetic tasks, high-quality feedback accelerates self-correction, and there is a trade-off between content construction and visual design; EvoPresent consistently outperforms baselines in both narrative quality and aesthetics, approaching human-designed presentations. Together, these contributions advance automated presentation generation toward scalable, engaging dissemination of research with minimal human intervention, using thresholds , , and in the learning signals.

Abstract

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: \emph{there is no way to improve it when you cannot evaluate it right}. To address this, we introduce \textbf{EvoPresent}, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is \textbf{PresAesth}, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce \textbf{EvoPresent Benchmark}, a comprehensive benchmark comprising: \textit{Presentation Generation Quality}, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and \textit{Aesthetic Awareness}, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.

Paper Structure

This paper contains 28 sections, 8 equations, 15 figures, 5 tables, 1 algorithm.

Figures (15)

  • Figure 1: Comparison between EvoPresent and other methods. (a) EvoPresent achieves high quality with fewer iteration through its self-improvement framework, supporting multiple formats (videos, scripts, slides) for a more realistic presentation. (b) PPTAgent zheng2025pptagentgeneratingevaluatingpresentations and PresentAgent shi2025presentagentmultimodalagentpresentation lack content expressiveness and are limited by fixed templates. (c) Paper2Poster pang2025paper2postermultimodalposterautomation lacks flexibility and an effective visual checker, leading to poor visual design and requiring extensive adjustments.
  • Figure 2: Overview of the EvoPresent framework.(a) EvoPresent first performs content extraction and voice generation, then constructs the storyline and script, followed by content enhancement using image generation and knowledge retrieval. Design and rendering are handled next, and the aesthetic checker evaluates the initial slide and provides adjustments. (b) PresAesth is trained on a human-preference aesthetic dataset via multiple tasks (scoring, defect adjustment, and comparison). (c) The PresAesth model guides the agent framework in iterative self-improvement.
  • Figure 3: Data Statistics for our Benchmark. (a) The categories of papers across different venues. (b) Distribution of presentation videos and scripts, including slide counts, video duration, average slide frame time, and slide script tokens. (c) The overall scores and deficiency categories of aesthetic awareness data.
  • Figure 4: Evaluation of the presentation experience. (a) Video performance assessed on 4 dimensions. (b) Content delivery evaluated with verbatim and explanatory questions.
  • Figure 5: Illustration of presentation variants by different methods: (a) Author-designed, (b) Our EvoPresent, (c) PresentAgent, (d) Paper2poster, (e) GPT5-HTML (web-based), (f) GPT-4o-Image (pixel-based). The figure highlights several common design deficiencies marked with colored boxes: (1) overlap issues, (2) content errors, (3) typography defects, and (4) unbalanced layout design. The results indicate that existing generation methods generally exhibit deficiencies in aesthetic design, whereas our method achieves the closest visual alignment with the human-designed reference.
  • ...and 10 more figures