Table of Contents
Fetching ...

Intensive Care as One Big Sequence Modeling Problem

Vadim Liventsev, Tobias Fritz

TL;DR

The paper reframes Intensive Care as a general sequence-modeling problem and introduces MIMIC-Ext-SEQ, a large, standardized ICU event-stream benchmark derived from MIMIC-IV. It formalizes healthcare trajectories as event sequences and discusses next-event prediction for imitation vs. planning under a POMDP, motivating foundation-model approaches. A full dataset-building pipeline, clustering strategy, train/test splits, evaluation guidelines, and a simple MLP baseline demonstrate viability and establish a reproducible evaluation framework for long-horizon ICU forecasting and downstream tasks. The work aims to catalyze development of foundation models in healthcare and provides a concrete path toward generalizable ICU models that can generalize across mortality, length-of-stay, and other outcomes. The practical impact lies in enabling standardized benchmarks and promoting advanced architectures (Transformers, Neural CDEs, SSMs) for robust, generalizable ICU prediction and decision support.

Abstract

Reinforcement Learning in Healthcare is typically concerned with narrow self-contained tasks such as sepsis prediction or anesthesia control. However, previous research has demonstrated the potential of generalist models (the prime example being Large Language Models) to outperform task-specific approaches due to their capability for implicit transfer learning. To enable training of foundation models for Healthcare as well as leverage the capabilities of state of the art Transformer architectures, we propose the paradigm of Healthcare as Sequence Modeling, in which interaction between the patient and the healthcare provider is represented as an event stream and tasks like diagnosis and treatment selection are modeled as prediction of future events in the stream. To explore this paradigm experimentally we develop MIMIC-SEQ, a sequence modeling benchmark derived by translating heterogenous clinical records from MIMIC-IV dataset into a uniform event stream format, train a baseline model and explore its capabilities.

Intensive Care as One Big Sequence Modeling Problem

TL;DR

The paper reframes Intensive Care as a general sequence-modeling problem and introduces MIMIC-Ext-SEQ, a large, standardized ICU event-stream benchmark derived from MIMIC-IV. It formalizes healthcare trajectories as event sequences and discusses next-event prediction for imitation vs. planning under a POMDP, motivating foundation-model approaches. A full dataset-building pipeline, clustering strategy, train/test splits, evaluation guidelines, and a simple MLP baseline demonstrate viability and establish a reproducible evaluation framework for long-horizon ICU forecasting and downstream tasks. The work aims to catalyze development of foundation models in healthcare and provides a concrete path toward generalizable ICU models that can generalize across mortality, length-of-stay, and other outcomes. The practical impact lies in enabling standardized benchmarks and promoting advanced architectures (Transformers, Neural CDEs, SSMs) for robust, generalizable ICU prediction and decision support.

Abstract

Reinforcement Learning in Healthcare is typically concerned with narrow self-contained tasks such as sepsis prediction or anesthesia control. However, previous research has demonstrated the potential of generalist models (the prime example being Large Language Models) to outperform task-specific approaches due to their capability for implicit transfer learning. To enable training of foundation models for Healthcare as well as leverage the capabilities of state of the art Transformer architectures, we propose the paradigm of Healthcare as Sequence Modeling, in which interaction between the patient and the healthcare provider is represented as an event stream and tasks like diagnosis and treatment selection are modeled as prediction of future events in the stream. To explore this paradigm experimentally we develop MIMIC-SEQ, a sequence modeling benchmark derived by translating heterogenous clinical records from MIMIC-IV dataset into a uniform event stream format, train a baseline model and explore its capabilities.
Paper Structure (15 sections, 6 equations, 6 tables)