Table of Contents
Fetching ...

SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations

Eric Xie, Danielle Waterfield, Michael Kennedy, Aidong Zhang

TL;DR

SlideBot presents a modular, multi-agent framework for generating informative, reliable, and practical university-level presentations by grounding outputs in external sources and applying evidence-based instructional design (CLT and CTML). The pipeline decouples content retrieval, structured planning, and LaTeX Beamer code generation, coordinated by a central Moderator, and augments slides with instructor-facing comments and figure macros. Empirical evaluations in AI/biomedical education show SlideBot outperforms Microsoft Copilot and direct prompting across informativeness, reliability, and practicality, driven more by architectural decomposition than by base model size. The work demonstrates a scalable, flexible approach to AI-assisted slides that mitigates hallucinations and supports instructor customization, with clear directions for future enhancements and broader domain deployment.

Abstract

Large Language Models (LLMs) have shown immense potential in education, automating tasks like quiz generation and content summarization. However, generating effective presentation slides introduces unique challenges due to the complexity of multimodal content creation and the need for precise, domain-specific information. Existing LLM-based solutions often fail to produce reliable and informative outputs, limiting their educational value. To address these limitations, we introduce SlideBot - a modular, multi-agent slide generation framework that integrates LLMs with retrieval, structured planning, and code generation. SlideBot is organized around three pillars: informativeness, ensuring deep and contextually grounded content; reliability, achieved by incorporating external sources through retrieval; and practicality, which enables customization and iterative feedback through instructor collaboration. It incorporates evidence-based instructional design principles from Cognitive Load Theory (CLT) and the Cognitive Theory of Multimedia Learning (CTML), using structured planning to manage intrinsic load and consistent visual macros to reduce extraneous load and enhance dual-channel learning. Within the system, specialized agents collaboratively retrieve information, summarize content, generate figures, and format slides using LaTeX, aligning outputs with instructor preferences through interactive refinement. Evaluations from domain experts and students in AI and biomedical education show that SlideBot consistently enhances conceptual accuracy, clarity, and instructional value. These findings demonstrate SlideBot's potential to streamline slide preparation while ensuring accuracy, relevance, and adaptability in higher education.

SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations

TL;DR

SlideBot presents a modular, multi-agent framework for generating informative, reliable, and practical university-level presentations by grounding outputs in external sources and applying evidence-based instructional design (CLT and CTML). The pipeline decouples content retrieval, structured planning, and LaTeX Beamer code generation, coordinated by a central Moderator, and augments slides with instructor-facing comments and figure macros. Empirical evaluations in AI/biomedical education show SlideBot outperforms Microsoft Copilot and direct prompting across informativeness, reliability, and practicality, driven more by architectural decomposition than by base model size. The work demonstrates a scalable, flexible approach to AI-assisted slides that mitigates hallucinations and supports instructor customization, with clear directions for future enhancements and broader domain deployment.

Abstract

Large Language Models (LLMs) have shown immense potential in education, automating tasks like quiz generation and content summarization. However, generating effective presentation slides introduces unique challenges due to the complexity of multimodal content creation and the need for precise, domain-specific information. Existing LLM-based solutions often fail to produce reliable and informative outputs, limiting their educational value. To address these limitations, we introduce SlideBot - a modular, multi-agent slide generation framework that integrates LLMs with retrieval, structured planning, and code generation. SlideBot is organized around three pillars: informativeness, ensuring deep and contextually grounded content; reliability, achieved by incorporating external sources through retrieval; and practicality, which enables customization and iterative feedback through instructor collaboration. It incorporates evidence-based instructional design principles from Cognitive Load Theory (CLT) and the Cognitive Theory of Multimedia Learning (CTML), using structured planning to manage intrinsic load and consistent visual macros to reduce extraneous load and enhance dual-channel learning. Within the system, specialized agents collaboratively retrieve information, summarize content, generate figures, and format slides using LaTeX, aligning outputs with instructor preferences through interactive refinement. Evaluations from domain experts and students in AI and biomedical education show that SlideBot consistently enhances conceptual accuracy, clarity, and instructional value. These findings demonstrate SlideBot's potential to streamline slide preparation while ensuring accuracy, relevance, and adaptability in higher education.

Paper Structure

This paper contains 27 sections, 20 figures, 1 table.

Figures (20)

  • Figure 1: SlideBot's slide generation pipeline operates in three stages: Content Retrieval, where the Moderator receives a topic from the Instructor and communicates with the Retriever to gather and summarize relevant information from a user-selected or automatically designated corpus; Slide Draft Generation, where the Moderator constructs a structured slide plan and the Code Generator translates it into LaTeX Beamer code; and Presentation Enhancement, where the Enhancer inserts figures and instructional comments before returning a completed presentation to the instructor. The Moderator coordinates all agents and manages feedback loops to ensure quality, adaptability, and consistency.
  • Figure 2: Qualitative comparison of slides generated by Copilot and GPT-4o Direct Prompting (left), and SlideBot (right) on the topic “Manifold Learning.” Copilot and Direct Prompt outputs lack explanatory depth, meaningful visuals, and relevant citations. In contrast, SlideBot produces focused, grounded, and pedagogically useful content by retrieving information from lensen2021genetic via the Retriever agent, adding figures and instructor comments via the Enhancer, and compiling structured LaTeX Beamer slides through the Code Generator.
  • Figure 3: Comparison of Direct Prompt and SlideBot generation (both using GPT-4o) across six presentation quality metrics, with Explanation Style, Structure & Flow, Credibility, and Overall Suitability obtained from a student survey, and Conceptual Accuracy and Topic Coverage obtained from an expert survey.
  • Figure 4: Comparison of informativeness metrics across model and context combinations obtained from an expert survey. Results for GPT-4o models (with and without context) are repeated from Figure \ref{['fig:ablation_quality']} to enable side-by-side comparison with GPT-4o-mini variants, renamed for clarity.
  • Figure 5: Prompt Template for the Moderator to select keywords based on the presentation topic for retrieval.
  • ...and 15 more figures