Table of Contents
Fetching ...

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Hao Zheng, Guozhao Mo, Xinru Yan, Qianhao Yuan, Wenkai Zhang, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

TL;DR

DeepPresenter is presented, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline, and achieves state-of-the-art performance.

Abstract

Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

TL;DR

DeepPresenter is presented, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline, and achieves state-of-the-art performance.

Abstract

Presentation generation requires deep content research, coherent visual design, and iterative refinement based on observation. However, existing presentation agents often rely on predefined workflows and fixed templates. To address this, we present DeepPresenter, an agentic framework that adapts to diverse user intents, enables effective feedback-driven refinement, and generalizes beyond a scripted pipeline. Specifically, DeepPresenter autonomously plans, renders, and revises intermediate slide artifacts to support long-horizon refinement with environmental observations. Furthermore, rather than relying on self-reflection over internal signals (e.g., reasoning traces), our environment-grounded reflection conditions the generation process on perceptual artifact states (e.g., rendered slides), enabling the system to identify and correct presentation-specific issues during execution. Results on the evaluation set covering diverse presentation-generation scenarios show that DeepPresenter achieves state-of-the-art performance, and the fine-tuned 9B model remains highly competitive at substantially lower cost. Our project is available at: https://github.com/icip-cas/PPTAgent
Paper Structure (44 sections, 8 figures, 8 tables)

This paper contains 44 sections, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Illustration of DeepPresenter. Given a user instruction, the Researcher gathers information and compiles a structured manuscript, while the Presenter transforms it into visual slides. Both agents interact and collaborate with a shared environment, leveraging grounded observations for reflective refinement.
  • Figure 2: Comparison between self-reflection and environment-grounded reflection. Self-reflection relies on uncertain triggers and inputs without external signals. DeepPresenter grounds reflection in environmental observations through the inspect tool.
  • Figure 3: Our data synthesis pipeline. The process ensures high-quality trajectories for supervised fine-tuning through three integrated mechanisms: (1) Query Construction augments tasks with verifiable constraints; (2) Extrinsic Verification injects reasoning traces when defects are identified to guide agent self-correction during sampling; and (3) Trajectory Filtering validates constraint compliance and assesses consistency and output quality.
  • Figure 4: Distribution of defects identified by self-verification and extrinsic verification for manuscripts (left) and slides (right), respectively.
  • Figure 5: Failure distribution in synthesized trajectories before filtering
  • ...and 3 more figures