Tell me what I need to know: Exploring LLM-based (Personalized) Abstractive Multi-Source Meeting Summarization
Frederic Kirstein, Terry Ruas, Robert Kratel, Bela Gipp
TL;DR
The work addresses salience identification and personalization in meeting summaries by proposing a three-stage RAG-based pipeline that enriches transcripts with information inferred from supplementary materials. It also adds a personalization protocol that extracts participant personas to tailor summaries for specific readers. Through the MS-AMI dataset and extensive experiments using GPT-4 Turbo and smaller backbones, the approach demonstrates improved informativeness, relevance, and reduced hallucinations for general and personalized summaries, while exploring practical deployment trade-offs. The findings highlight the value of distributing multi-source challenges across dedicated modules and pave the way for on-device and cost-aware implementations in real-world settings.
Abstract
Meeting summarization is crucial in digital communication, but existing solutions struggle with salience identification to generate personalized, workable summaries, and context understanding to fully comprehend the meetings' content. Previous attempts to address these issues by considering related supplementary resources (e.g., presentation slides) alongside transcripts are hindered by models' limited context sizes and handling the additional complexities of the multi-source tasks, such as identifying relevant information in additional files and seamlessly aligning it with the meeting content. This work explores multi-source meeting summarization considering supplementary materials through a three-stage large language model approach: identifying transcript passages needing additional context, inferring relevant details from supplementary materials and inserting them into the transcript, and generating a summary from this enriched transcript. Our multi-source approach enhances model understanding, increasing summary relevance by ~9% and producing more content-rich outputs. We introduce a personalization protocol that extracts participant characteristics and tailors summaries accordingly, improving informativeness by ~10%. This work further provides insights on performance-cost trade-offs across four leading model families, including edge-device capable options. Our approach can be extended to similar complex generative tasks benefitting from additional resources and personalization, such as dialogue systems and action planning.
