Table of Contents
Fetching ...

Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions

Frederic Kirstein, Sonu Kumar, Terry Ruas, Bela Gipp

TL;DR

Re-FRAME the Meeting Summarization SCOPE presents FRAME, a four-stage, fact-centric approach that treats meeting summarization as an enrichment task grounded in verifiable facts. It introduces SCOPE, a reason-out-loud personalization protocol that grounds content selection in reader-specific goals, and P-MESA, a reference-free personalization metric that aligns with human judgments. Across QMSum and FAME benchmarks, FRAME reduces hallucination and omission, while SCOPE enhances knowledge-level fit and goal alignment without compromising general quality. The work demonstrates cross-model robustness, ablation-informed architecture choices, and an open-source toolkit enabling multilingual and multi-source extensions. Overall, the framework argues for controlled, faithful, and personalized meeting summaries grounded in explicit reasoning about readers’ needs.

Abstract

Meeting summarization with large language models (LLMs) remains error-prone, often producing outputs with hallucinations, omissions, and irrelevancies. We present FRAME, a modular pipeline that reframes summarization as a semantic enrichment task. FRAME extracts and scores salient facts, organizes them thematically, and uses these to enrich an outline into an abstractive summary. To personalize summaries, we introduce SCOPE, a reason-out-loud protocol that has the model build a reasoning trace by answering nine questions before content selection. For evaluation, we propose P-MESA, a multi-dimensional, reference-free evaluation framework to assess if a summary fits a target reader. P-MESA reliably identifies error instances, achieving >= 89% balanced accuracy against human annotations and strongly aligns with human severity ratings (r >= 0.70). On QMSum and FAME, FRAME reduces hallucination and omission by 2 out of 5 points (measured with MESA), while SCOPE improves knowledge fit and goal alignment over prompt-only baselines. Our findings advocate for rethinking summarization to improve control, faithfulness, and personalization.

Re-FRAME the Meeting Summarization SCOPE: Fact-Based Summarization and Personalization via Questions

TL;DR

Re-FRAME the Meeting Summarization SCOPE presents FRAME, a four-stage, fact-centric approach that treats meeting summarization as an enrichment task grounded in verifiable facts. It introduces SCOPE, a reason-out-loud personalization protocol that grounds content selection in reader-specific goals, and P-MESA, a reference-free personalization metric that aligns with human judgments. Across QMSum and FAME benchmarks, FRAME reduces hallucination and omission, while SCOPE enhances knowledge-level fit and goal alignment without compromising general quality. The work demonstrates cross-model robustness, ablation-informed architecture choices, and an open-source toolkit enabling multilingual and multi-source extensions. Overall, the framework argues for controlled, faithful, and personalized meeting summaries grounded in explicit reasoning about readers’ needs.

Abstract

Meeting summarization with large language models (LLMs) remains error-prone, often producing outputs with hallucinations, omissions, and irrelevancies. We present FRAME, a modular pipeline that reframes summarization as a semantic enrichment task. FRAME extracts and scores salient facts, organizes them thematically, and uses these to enrich an outline into an abstractive summary. To personalize summaries, we introduce SCOPE, a reason-out-loud protocol that has the model build a reasoning trace by answering nine questions before content selection. For evaluation, we propose P-MESA, a multi-dimensional, reference-free evaluation framework to assess if a summary fits a target reader. P-MESA reliably identifies error instances, achieving >= 89% balanced accuracy against human annotations and strongly aligns with human severity ratings (r >= 0.70). On QMSum and FAME, FRAME reduces hallucination and omission by 2 out of 5 points (measured with MESA), while SCOPE improves knowledge fit and goal alignment over prompt-only baselines. Our findings advocate for rethinking summarization to improve control, faithfulness, and personalization.

Paper Structure

This paper contains 103 sections, 4 equations, 21 figures, 28 tables.

Figures (21)

  • Figure 1: FRAME pipeline with SCOPE integration. FRAME structures summarization in four stages: fact identification, note taking, organization, and enrichment-based generation. SCOPE plugs into salience scoring by injecting a reasoning trace derived from reader-specific questions.
  • Figure 2: Comparison of our statement-context tuple (OURS) against a high-granularity fact (Atomic) and a high-context fact (Molecular).
  • Figure 3: 4-quadrant plot of total architecture cost (avg.) vs quality measured by MESA. The top left indicates ideal high quality and low cost. Blue dots are single LLM instances for GPT-4o, Gemini 1.5 pro, Llama 3.1 8b, and Gemma 3 4b. Organce squares are FRAME summaries with the different backbones. FEEDBACK-3 relates to the self-refinement baseline by KirsteinLG25a with a GPT-4o backbone and three refinement loops.
  • Figure 4: Snippet of facts extracted from a QMSum meeting.
  • Figure 5: Example of a role-playing GPT answering the questionnaire \ref{['tab:reasoning_questionnaire']} before fact selection.
  • ...and 16 more figures