Table of Contents
Fetching ...

Medifact at PerAnsSumm 2025: Leveraging Lightweight Models for Perspective-Specific Summarization of Clinical Q&A Forums

Nadia Saeed

TL;DR

The paper tackles perspective-aware healthcare Q&A summarization by formulating a lightweight, hybrid pipeline that blends weak supervision, sentence-embedding–driven SVM classification, and zero-shot BART-MNLI for perspective labeling, followed by a two-stage summarization using BART for extraction and Pegasus for abstractive refinement. It demonstrates how to balance computational efficiency with contextual accuracy on the PerAnsSumm 2025 task, achieving competitive results and a top-12 placement for MediFact. Key contributions include a modular workflow that reduces reliance on large LLMs, a robust combination of Snorkel-based labeling, and transformer-based summarization, along with a clear evaluation across both span identification/classification and summarization metrics. The study highlights practical avenues for deploying clinically relevant CQA summarization systems with limited resources, and outlines future work in lightweight fine-tuning, quantization, and retrieval-augmented generation.

Abstract

The PerAnsSumm 2025 challenge focuses on perspective-aware healthcare answer summarization (Agarwal et al., 2025). This work proposes a few-shot learning framework using a Snorkel-BART-SVM pipeline for classifying and summarizing open-ended healthcare community question-answering (CQA). An SVM model is trained with weak supervision via Snorkel, enhancing zero-shot learning. Extractive classification identifies perspective-relevant sentences, which are then summarized using a pretrained BART-CNN model. The approach achieved 12th place among 100 teams in the shared task, demonstrating computational efficiency and contextual accuracy. By leveraging pretrained summarization models, this work advances medical CQA research and contributes to clinical decision support systems.

Medifact at PerAnsSumm 2025: Leveraging Lightweight Models for Perspective-Specific Summarization of Clinical Q&A Forums

TL;DR

The paper tackles perspective-aware healthcare Q&A summarization by formulating a lightweight, hybrid pipeline that blends weak supervision, sentence-embedding–driven SVM classification, and zero-shot BART-MNLI for perspective labeling, followed by a two-stage summarization using BART for extraction and Pegasus for abstractive refinement. It demonstrates how to balance computational efficiency with contextual accuracy on the PerAnsSumm 2025 task, achieving competitive results and a top-12 placement for MediFact. Key contributions include a modular workflow that reduces reliance on large LLMs, a robust combination of Snorkel-based labeling, and transformer-based summarization, along with a clear evaluation across both span identification/classification and summarization metrics. The study highlights practical avenues for deploying clinically relevant CQA summarization systems with limited resources, and outlines future work in lightweight fine-tuning, quantization, and retrieval-augmented generation.

Abstract

The PerAnsSumm 2025 challenge focuses on perspective-aware healthcare answer summarization (Agarwal et al., 2025). This work proposes a few-shot learning framework using a Snorkel-BART-SVM pipeline for classifying and summarizing open-ended healthcare community question-answering (CQA). An SVM model is trained with weak supervision via Snorkel, enhancing zero-shot learning. Extractive classification identifies perspective-relevant sentences, which are then summarized using a pretrained BART-CNN model. The approach achieved 12th place among 100 teams in the shared task, demonstrating computational efficiency and contextual accuracy. By leveraging pretrained summarization models, this work advances medical CQA research and contributes to clinical decision support systems.

Paper Structure

This paper contains 19 sections, 10 equations, 4 figures.

Figures (4)

  • Figure 1: Hybrid workflow for perspective classification and summarization. Perspectives are classified using heuristic labeling (Snorkel), SVM-based classification, and a zero-shot model fallback. Summarization is performed in two stages: extractive (BART) and abstractive (Pegasus), integrating the context for a refined output.
  • Figure 2: Training sample utilization for weak supervision. Known text spans from labeled data are used to train an SVM classifier, construct Snorkel labeling functions, and refine heuristic rules. The zero-shot model is excluded from direct training and is used as a fallback during classification.
  • Figure 3: The comparative analysis of MediFact's submitted models on the PerAnsSumm Shared Task - CL4Health@NAACL 2025.
  • Figure 4: Comparative Performance Analysis of MediFact Among the Top 12 Models in the PerAnsSumm Shared Task – CL4Health@NAACL 2025.