Table of Contents
Fetching ...

PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback

Sirui Chen, Jinsong Zhou, Xinli Xu, Xiaoyu Yang, Litao Guo, Ying-Cong Chen

TL;DR

PresentCoach addresses the gap in presentation training by integrating a slide-aware exemplar generator with a multimodal, audience-aware Coach Agent in a closed feedback loop. The Ideal Presentation Agent creates a personalized model presentation in the user’s own voice, while the Coach Agent provides Observation-Impact-Suggestion feedback and simulates audience responses to guide practice. A controlled study (N=24) shows that rehearsal with PresentCoach reduces public-speaking anxiety, maintains a moderate cognitive workload, and yields high usability, with participants perceiving clear role differentiation and synergistic benefits from the dual-agent interaction. The work demonstrates a scalable, human-centered approach to deliberate practice in educational and professional contexts, offering concrete targets and actionable guidance for improvement.

Abstract

Effective presentation skills are essential in education, professional communication, and public speaking, yet learners often lack access to high-quality exemplars or personalized coaching. Existing AI tools typically provide isolated functionalities such as speech scoring or script generation without integrating reference modeling and interactive feedback into a cohesive learning experience. We introduce a dual-agent system that supports presentation practice through two complementary roles: the Ideal Presentation Agent and the Coach Agent. The Ideal Presentation Agent converts user-provided slides into model presentation videos by combining slide processing, visual-language analysis, narration script generation, personalized voice synthesis, and synchronized video assembly. The Coach Agent then evaluates user-recorded presentations against these exemplars, conducting multimodal speech analysis and delivering structured feedback in an Observation-Impact-Suggestion (OIS) format. To enhance the authenticity of the learning experience, the Coach Agent incorporates an Audience Agent, which simulates the perspective of a human listener and provides humanized feedback reflecting audience reactions and engagement. Together, these agents form a closed loop of observation, practice, and feedback. Implemented on a robust backend with multi-model integration, voice cloning, and error handling mechanisms, the system demonstrates how AI-driven agents can provide engaging, human-centered, and scalable support for presentation skill development in both educational and professional contexts.

PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback

TL;DR

PresentCoach addresses the gap in presentation training by integrating a slide-aware exemplar generator with a multimodal, audience-aware Coach Agent in a closed feedback loop. The Ideal Presentation Agent creates a personalized model presentation in the user’s own voice, while the Coach Agent provides Observation-Impact-Suggestion feedback and simulates audience responses to guide practice. A controlled study (N=24) shows that rehearsal with PresentCoach reduces public-speaking anxiety, maintains a moderate cognitive workload, and yields high usability, with participants perceiving clear role differentiation and synergistic benefits from the dual-agent interaction. The work demonstrates a scalable, human-centered approach to deliberate practice in educational and professional contexts, offering concrete targets and actionable guidance for improvement.

Abstract

Effective presentation skills are essential in education, professional communication, and public speaking, yet learners often lack access to high-quality exemplars or personalized coaching. Existing AI tools typically provide isolated functionalities such as speech scoring or script generation without integrating reference modeling and interactive feedback into a cohesive learning experience. We introduce a dual-agent system that supports presentation practice through two complementary roles: the Ideal Presentation Agent and the Coach Agent. The Ideal Presentation Agent converts user-provided slides into model presentation videos by combining slide processing, visual-language analysis, narration script generation, personalized voice synthesis, and synchronized video assembly. The Coach Agent then evaluates user-recorded presentations against these exemplars, conducting multimodal speech analysis and delivering structured feedback in an Observation-Impact-Suggestion (OIS) format. To enhance the authenticity of the learning experience, the Coach Agent incorporates an Audience Agent, which simulates the perspective of a human listener and provides humanized feedback reflecting audience reactions and engagement. Together, these agents form a closed loop of observation, practice, and feedback. Implemented on a robust backend with multi-model integration, voice cloning, and error handling mechanisms, the system demonstrates how AI-driven agents can provide engaging, human-centered, and scalable support for presentation skill development in both educational and professional contexts.

Paper Structure

This paper contains 41 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The end-to-end workflow of the PresentCoach system. The Ideal Presentation Agent (top) processes user inputs through a four-stage pipeline: (1) Slide Processing - converting .pptx to high-resolution PNGs; (2) Script Generation - employing a Visual Language Model (VLM) to create coherent narration scripts; (3) Speech Synthesis - utilizing personalized voice cloning for speech generation; and (4) Video Assembly - synchronizing slides and audio into a benchmark presentation. The Coach Agent (bottom) then evaluates user practice through: (i) Multi-modal Speech Analysis comparing user performance against the ideal benchmark; (ii) Structured Feedback Generation delivering OIS-formatted suggestions; and (iii) Conversational AI Integration enabling interactive coaching via conversational agents.
  • Figure 2: (a) and (b) illustrate the Setup and Configuration stage (Stage 1), where users: A. Upload presentation files; B. Preview slide content; C. Record or upload a voice sample for cloning; D. Specify presentation requirements; E. View file upload status; F. Initiate generation. (c) visualizes the transparent pipeline of the Ideal Presentation Agent during generation (Stage 2), displaying real-time progress across key steps. (d) presents the Interactive Coaching and Practice Environment (Stage 3), which includes: H. The AI-generated ideal presentation video; I. Recording/upload controls for user practice; J. The corresponding script of the ideal narration; K. The conversational Coach Agent interface, providing structured feedback and interactive Q&A.
  • Figure 3: Participants’ usability ratings for the PresentCoach system on a 5-point Likert scale. Each item reflects an aspect of perceived ease of use, integration, and confidence. Overall responses indicate that participants found the system intuitive and well integrated after brief familiarization.
  • Figure 4: Perceived cognitive workload across six NASA-TLX dimensions (Mental, Physical, and Temporal Demand, Performance, Effort, and Frustration). Ratings were given on a 7-point scale, showing that most participants experienced the task as cognitively engaging but not overwhelming.
  • Figure 5: Participants’ evaluations of the dual-agent interaction on a 7-point Likert scale. Responses show that users clearly distinguished the roles of the Ideal Presentation Agent and the Coach Agent, perceived their collaboration as complementary, and regarded the overall experience as motivating and supportive.