Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution
Himanshu Maheshwari, Sambaran Bandyopadhyay, Aparna Garimella, Anandhavelu Natarajan
TL;DR
This work tackles generating non-linear, attribution-rich slide decks from long, text-only documents by introducing GDP, a graph-based pipeline that connects document structure to presentation content. It learns a paragraph-level affinity classifier, builds a document graph, and uses a GCN with negative sampling followed by spectral clustering to form slide clusters, which are then sequentially rendered by an LLM (GPT-3.5) with context from preceding slides. Key contributions include a RoBERTa-based classifier trained on SciDuet, an unsupervised graph learning objective with a binary cross-entropy loss on edges, and a comprehensive evaluation framework including automated metrics and human judgments across domains. GDP demonstrates improved content coverage, faithful attribution, and competitive non-linearity compared to purely prompting-based baselines, suggesting practical utility for generating high-quality first-draft presentations from lengthy documents.
Abstract
Automatically generating a presentation from the text of a long document is a challenging and useful problem. In contrast to a flat summary, a presentation needs to have a better and non-linear narrative, i.e., the content of a slide can come from different and non-contiguous parts of the given document. However, it is difficult to incorporate such non-linear mapping of content to slides and ensure that the content is faithful to the document. LLMs are prone to hallucination and their performance degrades with the length of the input document. Towards this, we propose a novel graph based solution where we learn a graph from the input document and use a combination of graph neural network and LLM to generate a presentation with attribution of content for each slide. We conduct thorough experiments to show the merit of our approach compared to directly using LLMs for this task.
