Leveraging Taxonomy Similarity for Next Activity Prediction in Patient Treatment
Martin Kuhn, Joscha Grüger, Tobias Geyer, Ralph Bergmann
TL;DR
This work tackles the problem of predicting the next treatment step in patient care by infusing domain knowledge from medical taxonomies into predictive process monitoring. The authors introduce TS4NAP, a taxonomy similarity framework that uses ICD-10-CM and ICD-10-PCS in conjunction with bipartite graph matching to compute semantically rich similarities between patient traces and propose likely next activities. They formalize the similarity using information-content based measures (Sánchez similarity) and combine it with an order-aware weighting to produce trace-level predictions, evaluated on 36 MIMIC-IV event logs with a leave-one-out setup. Results show that taxonomy-informed predictions significantly outperform a baseline on most datasets, achieving an average similarity around 74% and promising explainability, with best cases reaching up to 97%. The study demonstrates the practical potential of domain-specific knowledge to enhance predictive accuracy and interpretability in clinical decision support, and outlines future directions including scalability, additional data sources, and integration with deep learning while preserving transparency.
Abstract
The rapid progress in modern medicine presents physicians with complex challenges when planning patient treatment. Techniques from the field of Predictive Business Process Monitoring, like Next-activity-prediction (NAP) can be used as a promising technique to support physicians in treatment planning, by proposing a possible next treatment step. Existing patient data, often in the form of electronic health records, can be analyzed to recommend the next suitable step in the treatment process. However, the use of patient data poses many challenges due to its knowledge-intensive character, high variability and scarcity of medical data. To overcome these challenges, this article examines the use of the knowledge encoded in taxonomies to improve and explain the prediction of the next activity in the treatment process. This study proposes the TS4NAP approach, which uses medical taxonomies (ICD-10-CM and ICD-10-PCS) in combination with graph matching to assess the similarities of medical codes to predict the next treatment step. The effectiveness of the proposed approach will be evaluated using event logs that are derived from the MIMIC-IV dataset. The results highlight the potential of using domain-specific knowledge held in taxonomies to improve the prediction of the next activity, and thus can improve treatment planning and decision-making by making the predictions more explainable.
