Table of Contents
Fetching ...

MedCodER: A Generative AI Assistant for Medical Coding

Krishanu Das Baksi, Elijah Soba, John J. Higgins, Ravi Saini, Jaden Wood, Jane Cook, Jack Scott, Nirmala Pudota, Tim Weninger, Edward Bowen, Sanmitra Bhattacharya

TL;DR

This work introduces MedCodER, a Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components and confirms that MedCodER's performance depends on the integration of each of its aforementioned components, as performance declines when these components are evaluated in isolation.

Abstract

Medical coding is essential for standardizing clinical data and communication but is often time-consuming and prone to errors. Traditional Natural Language Processing (NLP) methods struggle with automating coding due to the large label space, lengthy text inputs, and the absence of supporting evidence annotations that justify code selection. Recent advancements in Generative Artificial Intelligence (AI) offer promising solutions to these challenges. In this work, we introduce MedCodER, a Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components. MedCodER achieves a micro-F1 score of 0.60 on International Classification of Diseases (ICD) code prediction, significantly outperforming state-of-the-art methods. Additionally, we present a new dataset containing medical records annotated with disease diagnoses, ICD codes, and supporting evidence texts (https://doi.org/10.5281/zenodo.13308316). Ablation tests confirm that MedCodER's performance depends on the integration of each of its aforementioned components, as performance declines when these components are evaluated in isolation.

MedCodER: A Generative AI Assistant for Medical Coding

TL;DR

This work introduces MedCodER, a Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components and confirms that MedCodER's performance depends on the integration of each of its aforementioned components, as performance declines when these components are evaluated in isolation.

Abstract

Medical coding is essential for standardizing clinical data and communication but is often time-consuming and prone to errors. Traditional Natural Language Processing (NLP) methods struggle with automating coding due to the large label space, lengthy text inputs, and the absence of supporting evidence annotations that justify code selection. Recent advancements in Generative Artificial Intelligence (AI) offer promising solutions to these challenges. In this work, we introduce MedCodER, a Generative AI framework for automatic medical coding that leverages extraction, retrieval, and re-ranking techniques as core components. MedCodER achieves a micro-F1 score of 0.60 on International Classification of Diseases (ICD) code prediction, significantly outperforming state-of-the-art methods. Additionally, we present a new dataset containing medical records annotated with disease diagnoses, ICD codes, and supporting evidence texts (https://doi.org/10.5281/zenodo.13308316). Ablation tests confirm that MedCodER's performance depends on the integration of each of its aforementioned components, as performance declines when these components are evaluated in isolation.
Paper Structure (21 sections, 3 figures, 3 tables)

This paper contains 21 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A schematic diagram of the MedCodER framework illustrates three primary components: extraction of disease diagnoses, supporting evidence and an initial list of ICD-10 codes, retrieval of candidate ICD-10 codes for the extracted diagnosis using a vector database, and re-ranking these combined codes to produce a final list of $k$ ICD-10 codes. Extracted disease mentions and supporting evidence are mapped back to the medical record for in-context highlighting, aiding medical coders in the coding process.
  • Figure 2: Recall and Precision @$k$ for variations of MedCodER framework
  • Figure 3: A representation of MedCodER in action. On the left, the medical record is annotated with the disease diagnosis for shortness of breath and its supporting evidence texts. On the right, the corresponding top 5 ICD-10 code suggestions are shown. Other diagnoses and supporting evidence texts can be toggled to show or hide using the 'Show' buttons next to them.