Table of Contents
Fetching ...

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

Jacopo Vitale, David Della Morte, Luca Bacco, Mario Merone, Mark de Groot, Saskia Haitjema, Leandro Pecchia, Bram van Es

TL;DR

This study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs), and reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients.

Abstract

To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three distinct modeling paradigms on longitudinal Dutch clinical narratives: classical machine learning baselines, specialized deep learning architectures optimized for large-context sequences, and general-purpose generative Large Language Models (LLMs) in a zero-shot setting. Additionally, we evaluated a late fusion strategy to integrate unstructured text with structured medication embeddings and anthropometric data. Our analysis reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients. These findings underscore the critical role of specialized hierarchical attention mechanisms in capturing long-range dependencies within medical texts, presenting a robust, automated alternative to manual workflows for clinical risk stratification.

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

TL;DR

This study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs), and reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients.

Abstract

To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three distinct modeling paradigms on longitudinal Dutch clinical narratives: classical machine learning baselines, specialized deep learning architectures optimized for large-context sequences, and general-purpose generative Large Language Models (LLMs) in a zero-shot setting. Additionally, we evaluated a late fusion strategy to integrate unstructured text with structured medication embeddings and anthropometric data. Our analysis reveals that the custom Transformer architecture outperforms both traditional methods and generative \acs{llm}s, achieving the highest F1-scores and Matthews Correlation Coefficients. These findings underscore the critical role of specialized hierarchical attention mechanisms in capturing long-range dependencies within medical texts, presenting a robust, automated alternative to manual workflows for clinical risk stratification.
Paper Structure (10 sections, 1 equation, 2 figures, 5 tables)

This paper contains 10 sections, 1 equation, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Graphical representation of an EHR. Highlighted items are the one used in this study, the not highlighted one are present but discarded.
  • Figure 2: Schematic overview of the Hierarchical Transformer classification pipeline. The architecture processes concatenated consults via BPE tokenization and hierarchical encoding, utilizing CLS Classification Token (orange arrows) or Global Average Pooling (purple arrows) strategies before late fusion with anthropometric data (or not).