Table of Contents
Fetching ...

uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

Aishik Nagar, Yutong Liu, Andy T. Liu, Viktor Schlegel, Vijay Prakash Dwivedi, Arun-Kumar Kaliya-Perumal, Guna Pratheep Kalanchiam, Yili Tang, Robby T. Tan

TL;DR

Medical abstractive summarization requires both faithfulness and informativeness, a balance often mishandled by prior methods. The authors introduce uMedSum, a modular three-stage framework that first generates an initial summary, then removes confabulated content via NLI-based decomposition into atomic facts, and finally adds missing information using a coverage-guided, extractive–abstractive hybrid. They accompany this with a comprehensive benchmark across three biomedical datasets and five metrics, demonstrating an average 11.8% improvement in reference-free metrics and strong clinician preference in difficult cases. An open-source benchmarking toolkit facilitates future research and adoption in clinically relevant, reliable medical summarization systems.

Abstract

Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum's summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum's effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.

uMedSum: A Unified Framework for Advancing Medical Abstractive Summarization

TL;DR

Medical abstractive summarization requires both faithfulness and informativeness, a balance often mishandled by prior methods. The authors introduce uMedSum, a modular three-stage framework that first generates an initial summary, then removes confabulated content via NLI-based decomposition into atomic facts, and finally adds missing information using a coverage-guided, extractive–abstractive hybrid. They accompany this with a comprehensive benchmark across three biomedical datasets and five metrics, demonstrating an average 11.8% improvement in reference-free metrics and strong clinician preference in difficult cases. An open-source benchmarking toolkit facilitates future research and adoption in clinically relevant, reliable medical summarization systems.

Abstract

Medical abstractive summarization faces the challenge of balancing faithfulness and informativeness. Current methods often sacrifice key information for faithfulness or introduce confabulations when prioritizing informativeness. While recent advancements in techniques like in-context learning (ICL) and fine-tuning have improved medical summarization, they often overlook crucial aspects such as faithfulness and informativeness without considering advanced methods like model reasoning and self-improvement. Moreover, the field lacks a unified benchmark, hindering systematic evaluation due to varied metrics and datasets. This paper addresses these gaps by presenting a comprehensive benchmark of six advanced abstractive summarization methods across three diverse datasets using five standardized metrics. Building on these findings, we propose uMedSum, a modular hybrid summarization framework that introduces novel approaches for sequential confabulation removal followed by key missing information addition, ensuring both faithfulness and informativeness. Our work improves upon previous GPT-4-based state-of-the-art (SOTA) medical summarization methods, significantly outperforming them in both quantitative metrics and qualitative domain expert evaluations. Notably, we achieve an average relative performance improvement of 11.8% in reference-free metrics over the previous SOTA. Doctors prefer uMedSum's summaries 6 times more than previous SOTA in difficult cases where there are chances of confabulations or missing information. These results highlight uMedSum's effectiveness and generalizability across various datasets and metrics, marking a significant advancement in medical summarization.
Paper Structure (35 sections, 5 equations, 3 figures, 3 tables, 2 algorithms)

This paper contains 35 sections, 5 equations, 3 figures, 3 tables, 2 algorithms.

Figures (3)

  • Figure 1: Overview of the proposed three-stage framework. The process is illustrated with example outputs at each stage when using uMedSum with Element Aware Summarization and GPT-4. Blue text indicates confabulated information (information not grounded in the input document), while red text highlights added key information previously missing from the summary.
  • Figure 2: Overview of the proposed medical summarization benchmark for fair comparison.
  • Figure 3: Benchmark of different Summarization Techniques across datasets on selected metrics. ROUGE-LSum and BertScore are reference-based metrics, while SummaC, QuestEval, and Entailment are used as reference-free metrics.