Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Peter Hartnett; Chung-Chi Huang; Sarah Hartnett; David Hartnett

Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Peter Hartnett, Chung-Chi Huang, Sarah Hartnett, David Hartnett

Abstract

Physician burnout in the United States has reached critical levels, driven in part by the administrative burden of Electronic Health Record (EHR) documentation and complex diagnostic codes. To relieve this strain and maintain strict patient privacy, this thesis explores an on-device, offline automatic medical coding system. The work focuses on using open-weight Large Language Models (LLMs) to extract clinical information from physician notes and translate it into ICD-10-CM diagnostic codes without reliance on cloud-based services. A privacy-focused pipeline was developed using Ollama, LangChain, and containerized environments to evaluate multiple open-weight models, including Llama 3.2, Mistral, Phi, and DeepSeek, on consumer-grade hardware. Model performance was assessed for zero-shot, few-shot, and retrieval-augmented generation (RAG) prompting strategies using a novel benchmark of synthetic medical notes. Results show that strict JSON schema enforcement achieved near 100% formatting compliance, but accurate generation of specific diagnostic codes remains challenging for smaller local models (7B-20B parameters). Contrary to common prompt-engineering guidance, few-shot prompting degraded performance through overfitting and hallucinations. While RAG enabled limited discovery of unseen codes, it frequently saturated context windows, reducing overall accuracy. The findings suggest that fully automated unsupervised coding with local open-source models is not yet reliable; instead, a human-in-the-loop assisted coding approach is currently the most practical path forward. This work contributes a reproducible local LLM architecture and benchmark dataset for privacy-preserving medical information extraction and coding.

Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Abstract

Paper Structure (49 sections, 19 figures, 2 tables)

This paper contains 49 sections, 19 figures, 2 tables.

Introduction
Literature Review
Advances in LLMs for Clinical Documentation and Prompt Engineering
Domain Adaptation and Fine-Tuning Approaches
Evaluation Frameworks
Methodology
Data Preparation
Structure of Medical Documentation
Medical Information for Extraction
Medical Coding Systems
Difficulties in Data and Evaluation
System Architecture
Example Zero-shot Prompt and Results
Few-shot Prompting
RAG-Enhanced Prompting
...and 34 more sections

Figures (19)

Figure 1: ICD-10-CM code structure.
Figure 2: System architecture.
Figure 3: Data processing chain.
Figure 4: Proposed multi-agent workflow.
Figure 5: Comparison of performance before and after iterative refinement of JSON terms
...and 14 more figures

Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Abstract

Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes

Authors

Abstract

Table of Contents

Figures (19)