Table of Contents
Fetching ...

FocusMed: A Large Language Model-based Framework for Enhancing Medical Question Summarization with Focus Identification

Chao Liu, Ling Luo, Tengxiao Lv, Huan Zhuang, Lejing Yu, Jian Wang, Hongfei Lin

TL;DR

This work tackles Medical Question Summary (MQS) by addressing two core issues: inaccurate identification of a user question's focus and model hallucinations. It introduces FocusMed, an LLM-based framework that first extracts the core focus from consumer health questions using carefully designed prompts and a faithfulness check, then augments the training data with these focus-aware instances and fine-tunes base models using QLoRA. A multi-dimensional evaluation and ensemble selection mechanism combines outputs from multiple model configurations, optimizing for faithfulness, conciseness, and coverage to deliver high-quality summaries. On the MEDIQA and MeqSum benchmarks, FocusMed achieves state-of-the-art performance with notable improvements in ROUGE-L and SummaC_ZS, while reducing hallucinations and preserving essential numerical and temporal details. The approach demonstrates the importance of explicit focus extraction and robust evaluation in medical summarization, offering practical improvements for downstream clinical interpretation and decision support, with code made publicly available.

Abstract

With the rapid development of online medical platforms, consumer health questions (CHQs) are inefficient in diagnosis due to redundant information and frequent non-professional terms. The medical question summary (MQS) task aims to transform CHQs into streamlined doctors' frequently asked questions (FAQs), but existing methods still face challenges such as poor identification of question focus and model hallucination. This paper explores the potential of large language models (LLMs) in the MQS task and finds that direct fine-tuning is prone to focus identification bias and generates unfaithful content. To this end, we propose an optimization framework based on core focus guidance. First, a prompt template is designed to drive the LLMs to extract the core focus from the CHQs that is faithful to the original text. Then, a fine-tuning dataset is constructed in combination with the original CHQ-FAQ pairs to improve the ability to identify the focus of the question. Finally, a multi-dimensional quality evaluation and selection mechanism is proposed to comprehensively improve the quality of the summary from multiple dimensions. We conduct comprehensive experiments on two widely-adopted MQS datasets using three established evaluation metrics. The proposed framework achieves state-of-the-art performance across all measures, demonstrating a significant boost in the model's ability to identify critical focus of questions and a notable mitigation of hallucinations. The source codes are freely available at https://github.com/DUT-LiuChao/FocusMed.

FocusMed: A Large Language Model-based Framework for Enhancing Medical Question Summarization with Focus Identification

TL;DR

This work tackles Medical Question Summary (MQS) by addressing two core issues: inaccurate identification of a user question's focus and model hallucinations. It introduces FocusMed, an LLM-based framework that first extracts the core focus from consumer health questions using carefully designed prompts and a faithfulness check, then augments the training data with these focus-aware instances and fine-tunes base models using QLoRA. A multi-dimensional evaluation and ensemble selection mechanism combines outputs from multiple model configurations, optimizing for faithfulness, conciseness, and coverage to deliver high-quality summaries. On the MEDIQA and MeqSum benchmarks, FocusMed achieves state-of-the-art performance with notable improvements in ROUGE-L and SummaC_ZS, while reducing hallucinations and preserving essential numerical and temporal details. The approach demonstrates the importance of explicit focus extraction and robust evaluation in medical summarization, offering practical improvements for downstream clinical interpretation and decision support, with code made publicly available.

Abstract

With the rapid development of online medical platforms, consumer health questions (CHQs) are inefficient in diagnosis due to redundant information and frequent non-professional terms. The medical question summary (MQS) task aims to transform CHQs into streamlined doctors' frequently asked questions (FAQs), but existing methods still face challenges such as poor identification of question focus and model hallucination. This paper explores the potential of large language models (LLMs) in the MQS task and finds that direct fine-tuning is prone to focus identification bias and generates unfaithful content. To this end, we propose an optimization framework based on core focus guidance. First, a prompt template is designed to drive the LLMs to extract the core focus from the CHQs that is faithful to the original text. Then, a fine-tuning dataset is constructed in combination with the original CHQ-FAQ pairs to improve the ability to identify the focus of the question. Finally, a multi-dimensional quality evaluation and selection mechanism is proposed to comprehensively improve the quality of the summary from multiple dimensions. We conduct comprehensive experiments on two widely-adopted MQS datasets using three established evaluation metrics. The proposed framework achieves state-of-the-art performance across all measures, demonstrating a significant boost in the model's ability to identify critical focus of questions and a notable mitigation of hallucinations. The source codes are freely available at https://github.com/DUT-LiuChao/FocusMed.

Paper Structure

This paper contains 18 sections, 3 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: A Comparative example for medical question summarization in MEDIQA dataset. Qwen represents the output generated directly by the fine-tuned Qwen2.5-7B model, while Our Model refers to the result obtained through the proposed FocusMed.
  • Figure 2: The overall framework of our FocusMed. We first utilize LLMs to extract key focuses from CHQ questions, constructing an enhanced dataset for subsequent fine-tuning. During the extraction and fine-tuning stages, we generate results using four different model combinations based on Qwen2.5-7B and LLaMA3.1-8B. (e.g., "Qwen+LLaMA" indicates that Qwen is used in the extraction stage and LLaMA in the fine-tuning stage.) The final outputs are selected based on three dimensions: faithfulness, conciseness, and coverage. Since the calculation process of coverage and faithfulness is similar, it is not shown in the figure.
  • Figure 3: Instructions used in the question focus extraction phase.
  • Figure 4: Performance of LLMs with different sizes in the extraction stage.
  • Figure 5: An example of accurately identifying the focus of the question. Blue font represents the intended core focus. Red font indicates cases where the model either incorrectly identifies the question focus or fails to capture it completely.
  • ...and 1 more figures