Table of Contents
Fetching ...

PASS: Presentation Automation for Slide Generation and Speech

Tushar Aggarwal, Aarohi Bhand

TL;DR

This paper tackles end-to-end automation of presentations from arbitrary documents, addressing both slide content and delivery. PASS introduces a two-module pipeline: Slide Generation (text/images extraction, title/content generation, image mapping) and Slide Presentation (script generation, AI voice synthesis) to produce coherent slides with synchronized narration. It extends prior document-to-slide work by supporting general documents beyond research papers and by automating the delivery stage. An LLM-based evaluation framework assessing coherence, redundancy, and relevance shows that GPT-PASS and Qwen-PASS achieve top performance on the SciDuet benchmark, indicating strong potential for real-world usage.

Abstract

In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.

PASS: Presentation Automation for Slide Generation and Speech

TL;DR

This paper tackles end-to-end automation of presentations from arbitrary documents, addressing both slide content and delivery. PASS introduces a two-module pipeline: Slide Generation (text/images extraction, title/content generation, image mapping) and Slide Presentation (script generation, AI voice synthesis) to produce coherent slides with synchronized narration. It extends prior document-to-slide work by supporting general documents beyond research papers and by automating the delivery stage. An LLM-based evaluation framework assessing coherence, redundancy, and relevance shows that GPT-PASS and Qwen-PASS achieve top performance on the SciDuet benchmark, indicating strong potential for real-world usage.

Abstract

In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.
Paper Structure (15 sections, 15 figures, 2 tables)

This paper contains 15 sections, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Overview of the PASS pipeline. It takes a user-provided document as input and generates presentation slides along with AI-generated voice narration.
  • Figure 2: Architecture of the PASS pipeline. It consists of two main modules—Slide Generation and Slide Presentation—each further divided into five and two sub-modules, respectively.
  • Figure 3: Prompt used for extracting topics for Non-Technical Audience
  • Figure 4: Prompt used for extracting topics for Technical Audience
  • Figure 5: Prompt used for extracting content for Non-Technical Audience
  • ...and 10 more figures