Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions
Frederic Kirstein, Sonu Kumar, Terry Ruas, Bela Gipp
TL;DR
Re-FRAME the Meeting Summarization SCOPE presents FRAME, a four-stage, fact-centric approach that treats meeting summarization as an enrichment task grounded in verifiable facts. It introduces SCOPE, a reason-out-loud personalization protocol that grounds content selection in reader-specific goals, and P-MESA, a reference-free personalization metric that aligns with human judgments. Across QMSum and FAME benchmarks, FRAME reduces hallucination and omission, while SCOPE enhances knowledge-level fit and goal alignment without compromising general quality. The work demonstrates cross-model robustness, ablation-informed architecture choices, and an open-source toolkit enabling multilingual and multi-source extensions. Overall, the framework argues for controlled, faithful, and personalized meeting summaries grounded in explicit reasoning about readers’ needs.
Abstract
Meeting summarization with large language models (LLMs) remains error-prone, often producing outputs with hallucinations, omissions, and irrelevancies. We present FRAME, a modular pipeline that reframes summarization as a semantic enrichment task. FRAME extracts and scores salient facts, organizes them thematically, and uses these to enrich an outline into an abstractive summary. To personalize summaries, we introduce SCOPE, a reason-out-loud protocol that has the model build a reasoning trace by answering nine questions before content selection. For evaluation, we propose P-MESA, a multi-dimensional, reference-free evaluation framework to assess if a summary fits a target reader. P-MESA reliably identifies error instances, achieving >= 89% balanced accuracy against human annotations and strongly aligns with human severity ratings (r >= 0.70). On QMSum and FAME, FRAME reduces hallucination and omission by 2 out of 5 points (measured with MESA), while SCOPE improves knowledge fit and goal alignment over prompt-only baselines. Our findings advocate for rethinking summarization to improve control, faithfulness, and personalization.
