Table of Contents
Fetching ...

Textual-to-Visual Iterative Self-Verification for Slide Generation

Yunqing Xu, Xinbei Ma, Jiyang Qiu, Hai Zhao

TL;DR

This work tackles automating academic slide generation by decomposing the task into content and layout generation and introducing a textual-to-visual iterative self-verification loop. The method uses a three-part content pipeline (Text Retriever, Figure Extractor, Content Generator) followed by a two-stage layout process (Initial Layout Draft and Textual-to-Visual Refinement) powered by a Reviewer + Refiner cycle. Empirical results show improvements in content alignment, logical flow, visual appeal, and readability over baselines, with GPT-4o delivering strong content performance and the modality-transformed refinement enhancing layout quality. The approach enables more efficient, high-quality slide generation and can be extended to complete multi-page presentations, offering practical value for researchers and educators.

Abstract

Generating presentation slides is a time-consuming task that urgently requires automation. Due to their limited flexibility and lack of automated refinement mechanisms, existing autonomous LLM-based agents face constraints in real-world applicability. We decompose the task of generating missing presentation slides into two key components: content generation and layout generation, aligning with the typical process of creating academic slides. First, we introduce a content generation approach that enhances coherence and relevance by incorporating context from surrounding slides and leveraging section retrieval strategies. For layout generation, we propose a textual-to-visual self-verification process using a LLM-based Reviewer + Refiner workflow, transforming complex textual layouts into intuitive visual formats. This modality transformation simplifies the task, enabling accurate and human-like review and refinement. Experiments show that our approach significantly outperforms baseline methods in terms of alignment, logical flow, visual appeal, and readability.

Textual-to-Visual Iterative Self-Verification for Slide Generation

TL;DR

This work tackles automating academic slide generation by decomposing the task into content and layout generation and introducing a textual-to-visual iterative self-verification loop. The method uses a three-part content pipeline (Text Retriever, Figure Extractor, Content Generator) followed by a two-stage layout process (Initial Layout Draft and Textual-to-Visual Refinement) powered by a Reviewer + Refiner cycle. Empirical results show improvements in content alignment, logical flow, visual appeal, and readability over baselines, with GPT-4o delivering strong content performance and the modality-transformed refinement enhancing layout quality. The approach enables more efficient, high-quality slide generation and can be extended to complete multi-page presentations, offering practical value for researchers and educators.

Abstract

Generating presentation slides is a time-consuming task that urgently requires automation. Due to their limited flexibility and lack of automated refinement mechanisms, existing autonomous LLM-based agents face constraints in real-world applicability. We decompose the task of generating missing presentation slides into two key components: content generation and layout generation, aligning with the typical process of creating academic slides. First, we introduce a content generation approach that enhances coherence and relevance by incorporating context from surrounding slides and leveraging section retrieval strategies. For layout generation, we propose a textual-to-visual self-verification process using a LLM-based Reviewer + Refiner workflow, transforming complex textual layouts into intuitive visual formats. This modality transformation simplifies the task, enabling accurate and human-like review and refinement. Experiments show that our approach significantly outperforms baseline methods in terms of alignment, logical flow, visual appeal, and readability.

Paper Structure

This paper contains 56 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overall Framework
  • Figure 2: Iterative Layout Refinement in the Reviewer + Refiner Workflow