Faithful Summarization of Consumer Health Queries: A Cross-Lingual Framework with LLMs
Ajwad Abrar, Nafisa Tabassum Oeshy, Prianka Maheru, Farzana Tabassum, Tareque Mohmud Chowdhury
TL;DR
CHQs are abundant and lengthy; faithful summarization is essential to avoid misrepresenting medical details. The authors propose a framework that combines TextRank-based extractive sentence selection, medical NER guidance, and fine-tuning of an LLM (LLaMA-2-7B with LoRA) to produce faithful summaries for English MeQSum and BanglaCHQ-Summ. The approach yields consistent gains over zero-shot and prior methods across quality and faithfulness metrics (ROUGE, BERTScore, Readability, SummaC, AlignScore), with human evaluation indicating about 82% faithfulness. This cross-lingual, faithfulness-aware framework supports safer deployment of LLMs in healthcare and informs multilingual CHQ summarization practices.
Abstract
Summarizing consumer health questions (CHQs) can ease communication in healthcare, but unfaithful summaries that misrepresent medical details pose serious risks. We propose a framework that combines TextRank-based sentence extraction and medical named entity recognition with large language models (LLMs) to enhance faithfulness in medical text summarization. In our experiments, we fine-tuned the LLaMA-2-7B model on the MeQSum (English) and BanglaCHQ-Summ (Bangla) datasets, achieving consistent improvements across quality (ROUGE, BERTScore, readability) and faithfulness (SummaC, AlignScore) metrics, and outperforming zero-shot baselines and prior systems. Human evaluation further shows that over 80\% of generated summaries preserve critical medical information. These results highlight faithfulness as an essential dimension for reliable medical summarization and demonstrate the potential of our approach for safer deployment of LLMs in healthcare contexts.
