Taming the Real-world Complexities in CPT E/M Coding with Large Language Models
Islam Nassar, Yang Lin, Yuan Jin, Rongxin Zhu, Chang Wei Tan, Zenan Zhai, Nitika Mathur, Thanh Tien Vu, Xu Zhong, Long Duong, Yuan-Fang Li
TL;DR
This paper tackles the challenge of automating CPT E/M coding in real-world healthcare settings by introducing ProFees, an LLM-based, modular architecture that jointly classifies encounter type and Medical Decision Making (MDM) complexity, then refines predictions through Recursive Criticism and a Self-Consistency ensemble before a deterministic rule-based decision tree outputs the final CPT code with justification. It integrates retrieval-augmented few-shot prompting, external exemplars from a vector database, and explicit critique to align outputs with the 2024 CPT MDM guidelines, addressing explainability and compliance needs. On a de-identified, expert-annotated real-world dataset, ProFees achieves substantial accuracy gains over a commercial CPT E/M coding system and outperforms strong baselines, validating its practicality and robustness in production workflows. The work demonstrates a path toward auditable, scalable, and domain-aligned automated coding in regulated clinical domains, with plans to extend to multiple codes and modifiers and to release synthetic data for broader research adoption.
Abstract
Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36\% over a commercial CPT E/M coding system and almost 5\% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.
