Analysis of heart failure patient trajectories using sequence modeling

Falk Dippel; Yinan Yu; Annika Rosengren; Martin Lindgren; Christina E. Lundberg; Erik Aerts; Martin Adiels; Helen Sjöland

Analysis of heart failure patient trajectories using sequence modeling

Falk Dippel, Yinan Yu, Annika Rosengren, Martin Lindgren, Christina E. Lundberg, Erik Aerts, Martin Adiels, Helen Sjöland

TL;DR

This paper tackles the need for systematic performance and efficiency insights in EHR-based sequence modeling for heart failure by conducting a comprehensive ablation across architecture classes (Transformers, Transformer++, Mamba) and input/temporal design choices. It analyzes token granularity, context length, model size, and history preprocessing in a large Swedish HF cohort (N = 42,820) across three one-year prediction tasks, revealing that Llama generally achieves the best discrimination and calibration, with strong data efficiency. The study provides concrete design recommendations, notably favoring $C=512$ with compact vocabularies and aggregated histories, and demonstrates that performance can scale with reduced data requirements and selective concept sets. Together, these findings offer practical guidance for developing clinically applicable, resource-efficient sequence models for EHR data and motivate extensions to multi-modal and external-validation studies.

Abstract

Transformers have defined the state-of-the-art for clinical prediction tasks involving electronic health records (EHRs). The recently introduced Mamba architecture outperformed an advanced Transformer (Transformer++) based on Llama in handling long context lengths, while using fewer model parameters. Despite the impressive performance of these architectures, a systematic approach to empirically analyze model performance and efficiency under various settings is not well established in the medical domain. The performances of six sequence models were investigated across three architecture classes (Transformers, Transformers++, Mambas) in a large Swedish heart failure (HF) cohort (N = 42820), providing a clinically relevant case study. Patient data included diagnoses, vital signs, laboratories, medications and procedures extracted from in-hospital EHRs. The models were evaluated on three one-year prediction tasks: clinical instability (a readmission phenotype) after initial HF hospitalization, mortality after initial HF hospitalization and mortality after latest hospitalization. Ablations account for modifications of the EHR-based input patient sequence, architectural model configurations, and temporal preprocessing techniques for data collection. Llama achieves the highest predictive discrimination, best calibration, and showed robustness across all tasks, followed by Mambas. Both architectures demonstrate efficient representation learning, with tiny configurations surpassing other large-scaled Transformers. At equal model size, Llama and Mambas achieve superior performance using 25% less training data. This paper presents a first ablation study with systematic design choices for input tokenization, model configuration and temporal data preprocessing. Future model development in clinical prediction tasks using EHRs could build upon this study's recommendation as a starting point.

Analysis of heart failure patient trajectories using sequence modeling

TL;DR

Abstract

Analysis of heart failure patient trajectories using sequence modeling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)