Real-time Speech Summarization for Medical Conversations
Khai Le-Duc, Khai-Nguyen Nguyen, Long Vo-Dang, Truong-Son Hy
TL;DR
The paper tackles information overload in real-time doctor-patient conversations by proposing a deployable real-time speech summarization system (RTSS) that generates local summaries after every N utterances and a global summary at dialogue end. It introduces VietMed-Sum, the first speech summarization dataset for medical conversations, and a GPT-assisted annotation workflow that creates gold-standard and synthetic summaries, enabling cost-efficient labeling. Through extensive experiments with Vietnamese and biomedical baselines (e.g., ViT5, ViPubmedT5, BARTpho), the authors show that leveraging GOLD plus SYN data significantly improves ROUGE scores, while pure SYN data is less effective; a two-step GPT→Human fine-tuning further narrows gaps. The work demonstrates practical deployment feasibility, reduces annotation costs, and offers a valuable dataset and evaluation framework to advance real-time medical conversation understanding and downstream tasks in noisy ASR contexts.
Abstract
In doctor-patient conversations, identifying medically relevant information is crucial, posing the need for conversation summarization. In this work, we propose the first deployable real-time speech summarization system for real-world applications in industry, which generates a local summary after every N speech utterances within a conversation and a global summary after the end of a conversation. Our system could enhance user experience from a business standpoint, while also reducing computational costs from a technical perspective. Secondly, we present VietMed-Sum which, to our knowledge, is the first speech summarization dataset for medical conversations. Thirdly, we are the first to utilize LLM and human annotators collaboratively to create gold standard and synthetic summaries for medical conversation summarization. Finally, we present baseline results of state-of-the-art models on VietMed-Sum. All code, data (English-translated and Vietnamese) and models are available online: https://github.com/leduckhai/MultiMed/tree/master/VietMed-Sum
