Table of Contents
Fetching ...

Taming the Real-world Complexities in CPT E/M Coding with Large Language Models

Islam Nassar, Yang Lin, Yuan Jin, Rongxin Zhu, Chang Wei Tan, Zenan Zhai, Nitika Mathur, Thanh Tien Vu, Xu Zhong, Long Duong, Yuan-Fang Li

TL;DR

This paper tackles the challenge of automating CPT E/M coding in real-world healthcare settings by introducing ProFees, an LLM-based, modular architecture that jointly classifies encounter type and Medical Decision Making (MDM) complexity, then refines predictions through Recursive Criticism and a Self-Consistency ensemble before a deterministic rule-based decision tree outputs the final CPT code with justification. It integrates retrieval-augmented few-shot prompting, external exemplars from a vector database, and explicit critique to align outputs with the 2024 CPT MDM guidelines, addressing explainability and compliance needs. On a de-identified, expert-annotated real-world dataset, ProFees achieves substantial accuracy gains over a commercial CPT E/M coding system and outperforms strong baselines, validating its practicality and robustness in production workflows. The work demonstrates a path toward auditable, scalable, and domain-aligned automated coding in regulated clinical domains, with plans to extend to multiple codes and modifiers and to release synthetic data for broader research adoption.

Abstract

Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36\% over a commercial CPT E/M coding system and almost 5\% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.

Taming the Real-world Complexities in CPT E/M Coding with Large Language Models

TL;DR

This paper tackles the challenge of automating CPT E/M coding in real-world healthcare settings by introducing ProFees, an LLM-based, modular architecture that jointly classifies encounter type and Medical Decision Making (MDM) complexity, then refines predictions through Recursive Criticism and a Self-Consistency ensemble before a deterministic rule-based decision tree outputs the final CPT code with justification. It integrates retrieval-augmented few-shot prompting, external exemplars from a vector database, and explicit critique to align outputs with the 2024 CPT MDM guidelines, addressing explainability and compliance needs. On a de-identified, expert-annotated real-world dataset, ProFees achieves substantial accuracy gains over a commercial CPT E/M coding system and outperforms strong baselines, validating its practicality and robustness in production workflows. The work demonstrates a path toward auditable, scalable, and domain-aligned automated coding in regulated clinical domains, with plans to extend to multiple codes and modifiers and to release synthetic data for broader research adoption.

Abstract

Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36\% over a commercial CPT E/M coding system and almost 5\% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.

Paper Structure

This paper contains 40 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: CPT E/M coding decision tree used to guide human coding of office and preventive visits. Other visit types are omitted for brevity. "SF" stands for "straightforward".
  • Figure 2: ProFees architecture for CPT E/M coding prediction. PC, DC and RC stand for problem, data and risk complexity respectively.
  • Figure 3: Effect of self-consistency ($K$) on overall and intermediate performance.
  • Figure 4: Frequency distribution of CPT E/M codes on the Platinum subset.
  • Figure 5: Frequency distribution of CPT E/M codes on the Disagreement subset.
  • ...and 4 more figures