Searching for Best Practices in Medical Transcription with Large Language Model

Jiafeng Li; Yanda Mu

Searching for Best Practices in Medical Transcription with Large Language Model

Jiafeng Li, Yanda Mu

TL;DR

This paper addresses the challenge of accurately transcribing medical monologues with Indian-accented speech and dense terminologies by using prompt-engineered corrections from Large Language Models (LLMs). It compares three approaches: (i) one-shot correction of the full ASR transcript, (ii) sentence-by-sentence corrections, and (iii) a manual plus LLM sentence-by-sentence workflow, to reduce the Word Error Rate (WER) defined as $WER = \frac{S + D + I}{N}$ and improve medical-terminology fidelity (KMTER). The results show that moving from a single-set correction to sentence-level corrections, especially when combined with human validation, yields progressive reductions in both general transcription errors and term-specific mistakes. The study demonstrates a practical, cross-validated pathway to more reliable clinical documentation in accented speech contexts, with potential to streamline physician notes and billing records. Key metrics include WER and KMTER, and the approach is demonstrated on a curated cardiology-focused dataset, with resources available for replication at the linked GitHub repository.

Abstract

The transcription of medical monologues, especially those containing a high density of specialized terminology and delivered with a distinct accent, presents a significant challenge for existing automated systems. This paper introduces a novel approach leveraging a Large Language Model (LLM) to generate highly accurate medical transcripts from audio recordings of doctors' monologues, specifically focusing on Indian accents. Our methodology integrates advanced language modeling techniques to lower the Word Error Rate (WER) and ensure the precise recognition of critical medical terms. Through rigorous testing on a comprehensive dataset of medical recordings, our approach demonstrates substantial improvements in both overall transcription accuracy and the fidelity of key medical terminologies. These results suggest that our proposed system could significantly aid in clinical documentation processes, offering a reliable tool for healthcare providers to streamline their transcription needs while maintaining high standards of accuracy.

Searching for Best Practices in Medical Transcription with Large Language Model

TL;DR

and improve medical-terminology fidelity (KMTER). The results show that moving from a single-set correction to sentence-level corrections, especially when combined with human validation, yields progressive reductions in both general transcription errors and term-specific mistakes. The study demonstrates a practical, cross-validated pathway to more reliable clinical documentation in accented speech contexts, with potential to streamline physician notes and billing records. Key metrics include WER and KMTER, and the approach is demonstrated on a curated cardiology-focused dataset, with resources available for replication at the linked GitHub repository.

Abstract

Paper Structure (9 sections, 1 equation, 1 figure, 1 table)

This paper contains 9 sections, 1 equation, 1 figure, 1 table.

Introduction
Dataset
Methods
ASR transcript correction by LLM in one set
ASR transcript sentence by sentence correction by LLM
Manual + LLM sentence by sentence correction of ASR transcript
Results
Conclusion
Notes.

Figures (1)

Figure 1: manual+ LLM workflow

Searching for Best Practices in Medical Transcription with Large Language Model

TL;DR

Abstract

Searching for Best Practices in Medical Transcription with Large Language Model

Authors

TL;DR

Abstract

Table of Contents

Figures (1)