Table of Contents
Fetching ...

EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records

Adibvafa Fallahpour, Mahshid Alinoori, Wenqian Ye, Xu Cao, Arash Afkanpour, Amrit Krishnan

TL;DR

EHRMamba tackles the practical deployment bottlenecks of Transformer-based EHR modeling by combining a linear-time state-space Mamba backbone with Multitask Prompted Finetuning to enable long-context processing and cross-task learning. The Odyssey toolkit provides HL7 FHIR-based data standardization and tooling to ease integration into hospital systems, while MPF reduces finetuning overhead and enhances generalization across six clinical tasks on MIMIC-IV, achieving state-of-the-art results with improved efficiency. The work demonstrates dual competence in EHR forecasting and clinical outcome prediction, and supports real-world deployment through scalable architecture and interpretable forecasting. Together, EhrMamba and Odyssey mark a major step toward scalable, generalizable foundation models for electronic health records with direct clinical impact.

Abstract

Transformers have significantly advanced the modeling of Electronic Health Records (EHR), yet their deployment in real-world healthcare is limited by several key challenges. Firstly, the quadratic computational cost and insufficient context length of these models hinder hospitals' ability in processing the extensive medical histories typical in EHR data. Additionally, existing models employ separate finetuning for each clinical task, complicating maintenance in healthcare environments. Moreover, these models focus exclusively on either clinical prediction or EHR forecasting, lacking proficiency in both tasks. To overcome these limitations, we introduce EHRMamba, a robust foundation model built on the Mamba architecture. EHRMamba can process sequences up to 300% longer than previous models due to its linear computational cost. We also introduce a novel approach to Multitask Prompted Finetuning (MPF) for EHR data, which enables EHRMamba to simultaneously learn multiple clinical tasks in a single finetuning phase, significantly enhancing deployment and cross-task generalization. Furthermore, our model leverages the HL7 FHIR data standard to simplify integration into existing hospital systems. Alongside EHRMamba, we open-source Odyssey, a toolkit designed to support the development and deployment of EHR foundation models, with an emphasis on data standardization and interpretability. Our evaluations on the MIMIC-IV dataset demonstrate that EHRMamba advances state-of-the-art performance across 6 major clinical tasks and excels in EHR forecasting, marking a significant leap forward in the field.

EHRMamba: Towards Generalizable and Scalable Foundation Models for Electronic Health Records

TL;DR

EHRMamba tackles the practical deployment bottlenecks of Transformer-based EHR modeling by combining a linear-time state-space Mamba backbone with Multitask Prompted Finetuning to enable long-context processing and cross-task learning. The Odyssey toolkit provides HL7 FHIR-based data standardization and tooling to ease integration into hospital systems, while MPF reduces finetuning overhead and enhances generalization across six clinical tasks on MIMIC-IV, achieving state-of-the-art results with improved efficiency. The work demonstrates dual competence in EHR forecasting and clinical outcome prediction, and supports real-world deployment through scalable architecture and interpretable forecasting. Together, EhrMamba and Odyssey mark a major step toward scalable, generalizable foundation models for electronic health records with direct clinical impact.

Abstract

Transformers have significantly advanced the modeling of Electronic Health Records (EHR), yet their deployment in real-world healthcare is limited by several key challenges. Firstly, the quadratic computational cost and insufficient context length of these models hinder hospitals' ability in processing the extensive medical histories typical in EHR data. Additionally, existing models employ separate finetuning for each clinical task, complicating maintenance in healthcare environments. Moreover, these models focus exclusively on either clinical prediction or EHR forecasting, lacking proficiency in both tasks. To overcome these limitations, we introduce EHRMamba, a robust foundation model built on the Mamba architecture. EHRMamba can process sequences up to 300% longer than previous models due to its linear computational cost. We also introduce a novel approach to Multitask Prompted Finetuning (MPF) for EHR data, which enables EHRMamba to simultaneously learn multiple clinical tasks in a single finetuning phase, significantly enhancing deployment and cross-task generalization. Furthermore, our model leverages the HL7 FHIR data standard to simplify integration into existing hospital systems. Alongside EHRMamba, we open-source Odyssey, a toolkit designed to support the development and deployment of EHR foundation models, with an emphasis on data standardization and interpretability. Our evaluations on the MIMIC-IV dataset demonstrate that EHRMamba advances state-of-the-art performance across 6 major clinical tasks and excels in EHR forecasting, marking a significant leap forward in the field.
Paper Structure (60 sections, 10 equations, 4 figures, 5 tables)

This paper contains 60 sections, 10 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Patient sequence example. Visit 1 has a procedure and two medications; visit 2, after two weeks, has three lab tests and another procedure. The concept embedding of each token is added to its attribute embeddings (type, age, time, segment, visit order, position) to encode the sequence.
  • Figure 2: ${\textbf{EhrMamba}}$ architecture. Pretraining uses the forecasting head and finetuning uses the clinical prediction head.
  • Figure 3: Left) ${\textbf{EhrMamba}}$ predicted tokens. Right) Average token attributions for mortality prediction. Lab tests are the most influential features in ${\textbf{EhrMamba}}$'s assessment of patient mortality risk.
  • Figure 4: Forecasting results on six patients of the test dataset.