CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations

Subash Neupane; Himanshu Tripathi; Shaswata Mitra; Sean Bozorgzad; Sudip Mittal; Shahram Rahimi; Amin Amirlatifi

CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations

Subash Neupane, Himanshu Tripathi, Shaswata Mitra, Sean Bozorgzad, Sudip Mittal, Shahram Rahimi, Amin Amirlatifi

TL;DR

ClinicSum addresses automatic SOAP-form clinical summary generation from doctor–patient conversations by coupling a retrieval-based filtering stage with a fine-tuned language-model generator. The system is trained on a SME-validated dataset of 1,473 conversation–summary pairs derived from FigShare and MTS-Dialog, and it uses an ensemble of sparse and dense retrieval with Reciprocal Rank Fusion to pass concise, relevant context to a fine-tuned PLM. Automatic metrics (ROUGE, BertScore) and expert human evaluations show ClinicSum with open-source models (notably LLAMA-3-8B) outperform GPT-based approaches in both lexical and semantic fidelity, while reducing hallucinations through token-filtering. The work demonstrates the practical potential of deploying efficient, domain-targeted summarization in clinical settings and points to future expansion of data, scalability, and bias/hallucination mitigation as priority directions.

Abstract

This paper presents ClinicSum, a novel framework designed to automatically generate clinical summaries from patient-doctor conversations. It utilizes a two-module architecture: a retrieval-based filtering module that extracts Subjective, Objective, Assessment, and Plan (SOAP) information from conversation transcripts, and an inference module powered by fine-tuned Pre-trained Language Models (PLMs), which leverage the extracted SOAP data to generate abstracted clinical summaries. To fine-tune the PLM, we created a training dataset of consisting 1,473 conversations-summaries pair by consolidating two publicly available datasets, FigShare and MTS-Dialog, with ground truth summaries validated by Subject Matter Experts (SMEs). ClinicSum's effectiveness is evaluated through both automatic metrics (e.g., ROUGE, BERTScore) and expert human assessments. Results show that ClinicSum outperforms state-of-the-art PLMs, demonstrating superior precision, recall, and F-1 scores in automatic evaluations and receiving high preference from SMEs in human assessment, making it a robust solution for automated clinical summarization.

CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations

TL;DR

Abstract

CLINICSUM: Utilizing Language Models for Generating Clinical Summaries from Patient-Doctor Conversations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)