Table of Contents
Fetching ...

Faithful Summarization of Consumer Health Queries: A Cross-Lingual Framework with LLMs

Ajwad Abrar, Nafisa Tabassum Oeshy, Prianka Maheru, Farzana Tabassum, Tareque Mohmud Chowdhury

TL;DR

CHQs are abundant and lengthy; faithful summarization is essential to avoid misrepresenting medical details. The authors propose a framework that combines TextRank-based extractive sentence selection, medical NER guidance, and fine-tuning of an LLM (LLaMA-2-7B with LoRA) to produce faithful summaries for English MeQSum and BanglaCHQ-Summ. The approach yields consistent gains over zero-shot and prior methods across quality and faithfulness metrics (ROUGE, BERTScore, Readability, SummaC, AlignScore), with human evaluation indicating about 82% faithfulness. This cross-lingual, faithfulness-aware framework supports safer deployment of LLMs in healthcare and informs multilingual CHQ summarization practices.

Abstract

Summarizing consumer health questions (CHQs) can ease communication in healthcare, but unfaithful summaries that misrepresent medical details pose serious risks. We propose a framework that combines TextRank-based sentence extraction and medical named entity recognition with large language models (LLMs) to enhance faithfulness in medical text summarization. In our experiments, we fine-tuned the LLaMA-2-7B model on the MeQSum (English) and BanglaCHQ-Summ (Bangla) datasets, achieving consistent improvements across quality (ROUGE, BERTScore, readability) and faithfulness (SummaC, AlignScore) metrics, and outperforming zero-shot baselines and prior systems. Human evaluation further shows that over 80\% of generated summaries preserve critical medical information. These results highlight faithfulness as an essential dimension for reliable medical summarization and demonstrate the potential of our approach for safer deployment of LLMs in healthcare contexts.

Faithful Summarization of Consumer Health Queries: A Cross-Lingual Framework with LLMs

TL;DR

CHQs are abundant and lengthy; faithful summarization is essential to avoid misrepresenting medical details. The authors propose a framework that combines TextRank-based extractive sentence selection, medical NER guidance, and fine-tuning of an LLM (LLaMA-2-7B with LoRA) to produce faithful summaries for English MeQSum and BanglaCHQ-Summ. The approach yields consistent gains over zero-shot and prior methods across quality and faithfulness metrics (ROUGE, BERTScore, Readability, SummaC, AlignScore), with human evaluation indicating about 82% faithfulness. This cross-lingual, faithfulness-aware framework supports safer deployment of LLMs in healthcare and informs multilingual CHQ summarization practices.

Abstract

Summarizing consumer health questions (CHQs) can ease communication in healthcare, but unfaithful summaries that misrepresent medical details pose serious risks. We propose a framework that combines TextRank-based sentence extraction and medical named entity recognition with large language models (LLMs) to enhance faithfulness in medical text summarization. In our experiments, we fine-tuned the LLaMA-2-7B model on the MeQSum (English) and BanglaCHQ-Summ (Bangla) datasets, achieving consistent improvements across quality (ROUGE, BERTScore, readability) and faithfulness (SummaC, AlignScore) metrics, and outperforming zero-shot baselines and prior systems. Human evaluation further shows that over 80\% of generated summaries preserve critical medical information. These results highlight faithfulness as an essential dimension for reliable medical summarization and demonstrate the potential of our approach for safer deployment of LLMs in healthcare contexts.

Paper Structure

This paper contains 11 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Proposed framework: TextRank extracts relevant sentences containing medical entities, which are used to fine-tune the LLM. The final summary is selected to maximize both accuracy and faithfulness.