PASS: Presentation Automation for Slide Generation and Speech
Tushar Aggarwal, Aarohi Bhand
TL;DR
This paper tackles end-to-end automation of presentations from arbitrary documents, addressing both slide content and delivery. PASS introduces a two-module pipeline: Slide Generation (text/images extraction, title/content generation, image mapping) and Slide Presentation (script generation, AI voice synthesis) to produce coherent slides with synchronized narration. It extends prior document-to-slide work by supporting general documents beyond research papers and by automating the delivery stage. An LLM-based evaluation framework assessing coherence, redundancy, and relevance shows that GPT-PASS and Qwen-PASS achieve top performance on the SciDuet benchmark, indicating strong potential for real-world usage.
Abstract
In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.
