A Comparative Study on Automatic Coding of Medical Letters with Explainability
Jamie Glen, Lifeng Han, Paul Rayson, Goran Nenadic
TL;DR
This paper investigates automatic coding of medical letters using NLP/ML with an emphasis on explainability and local deployment. It compares three attention-based models (CAML, HLAN, MHLAT) on MIMIC-III data for ICD code prediction and maps ICD codes to SNOMED CT for clinical interpretability. The study demonstrates that an explainable attention-driven approach can produce usable ICD predictions and SNOMED mappings, achieving 97.98% coverage of attempted codes when mapping and visualization are included, albeit with modest F1 scores on the small test subset. The findings highlight practical challenges in real-world deployment—data alignment, long-tail code distributions, and SNOMED mapping gaps—while suggesting a viable path toward clinician-facing automated coding tools in local healthcare settings.
Abstract
This study aims to explore the implementation of Natural Language Processing (NLP) and machine learning (ML) techniques to automate the coding of medical letters with visualised explainability and light-weighted local computer settings. Currently in clinical settings, coding is a manual process that involves assigning codes to each condition, procedure, and medication in a patient's paperwork (e.g., 56265001 heart disease using SNOMED CT code). There are preliminary research on automatic coding in this field using state-of-the-art ML models; however, due to the complexity and size of the models, the real-world deployment is not achieved. To further facilitate the possibility of automatic coding practice, we explore some solutions in a local computer setting; in addition, we explore the function of explainability for transparency of AI models. We used the publicly available MIMIC-III database and the HAN/HLAN network models for ICD code prediction purposes. We also experimented with the mapping between ICD and SNOMED CT knowledge bases. In our experiments, the models provided useful information for 97.98\% of codes. The result of this investigation can shed some light on implementing automatic clinical coding in practice, such as in hospital settings, on the local computers used by clinicians , project page \url{https://github.com/Glenj01/Medical-Coding}.
