Table of Contents
Fetching ...

Towards Foundation Models for Critical Care Time Series

Manuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska, Martin Faltys, Rita Kuznetsova, Gunnar Rätsch

TL;DR

A harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables, and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges.

Abstract

Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Towards Foundation Models for Critical Care Time Series

TL;DR

A harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables, and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges.

Abstract

Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Paper Structure

This paper contains 35 sections, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Visualization of harmonized and processed data by t-SNE vandermaaten08a_tsne. Each point represents a time step.
  • Figure 2: Single-center transfer performance heatmaps (AUROC). \ref{['fig:single-center-transfer-lgbm']} shows separate heatmaps for each task, while \ref{['fig:lgbm-avg-transfer', 'fig:gru-avg-transfer']} show task-averaged performance. Further results are shown in \ref{['fig:transformer-mamba-avg-transfer']} in \ref{['sec:app_add_results']}.
  • Figure 3: Supervised fine-tuning study performed on HiRID for circulatory failure prediction (\ref{['fig:fine-tuning-hirid-auroc', 'fig:fine-tuning-hirid-aupr']}) and decompensation (\ref{['fig:fine-tuning-hirid-auroc-decomp', 'fig:fine-tuning-hirid-aupr-decomp']}) by progressively increasing the number of admissions used for training or fine-tuning. GRU, Mamba, and LGBM w. feat. are trained from scratch using HiRID data only. GRU/Mamba pretrained is trained on all data excluding HiRID patients. GRU/Mamba fine-tuned (head/full) is initialized with GRU/Mamba pretrained and fine-tuned either across the full network or just the single linear logit head.
  • Figure 4: Single-center transfer performance heatmaps (AUROC).
  • Figure 5: Supervised fine-tuning study performed on HiRID for respiratory failure (\ref{['fig:fine-tuning-hirid-aupr-resp', 'fig:fine-tuning-hirid-auroc-kidney']}) and kidney failure (\ref{['fig:fine-tuning-hirid-auroc-kidney', 'fig:fine-tuning-hirid-aupr-kidney']}) by progressively increasing the number of patients shown during training or fine-tuning. GRU and LGBM w. feat. are trained from scratch using HiRID data only. GRU pretrained is trained on all data excluding HiRID patients. GRU fine-tuned (head/full) initialize the network with GRU pretrained and fine-tune the full network or only the single linear logit head.