Table of Contents
Fetching ...

VitalBench: A Rigorous Multi-Center Benchmark for Long-Term Vital Sign Prediction in Intraoperative Care

Xiuding Cai, Xueyao Wang, Sen Wang, Yaoyao Zhu, Jiao Chen, Yu Yao

TL;DR

VitalBench presents a rigorous multi-center benchmark for long-term intraoperative vital sign forecasting, addressing standardization, missing data, and cross-center generalization. By leveraging VitalDB and MOVER-SIS data across three realistic tracks and employing a masked loss, it reveals that models capturing inter-variable dependencies and handling missingness directly outperform imputation-based approaches, with data scale improving generalization yet cross-center gaps persisting. The study establishes a practical, end-to-end framework that supports fair comparisons and highlights key design principles for clinically deployable predictive systems in perioperative care. Overall, VitalBench advances the field by aligning model evaluation with real-world clinical variability and operational constraints.

Abstract

Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical time-series forecasting, several challenges persist, including the lack of standardized benchmarks, incomplete data, and limited cross-center validation. To address these challenges, we introduce VitalBench, a novel benchmark specifically designed for intraoperative vital sign prediction. VitalBench includes data from over 4,000 surgeries across two independent medical centers, offering three evaluation tracks: complete data, incomplete data, and cross-center generalization. This framework reflects the real-world complexities of clinical practice, minimizing reliance on extensive preprocessing and incorporating masked loss techniques for robust and unbiased model evaluation. By providing a standardized and unified platform for model development and comparison, VitalBench enables researchers to focus on architectural innovation while ensuring consistency in data handling. This work lays the foundation for advancing predictive models for intraoperative vital sign forecasting, ensuring that these models are not only accurate but also robust and adaptable across diverse clinical environments. Our code and data are available at https://github.com/XiudingCai/VitalBench.

VitalBench: A Rigorous Multi-Center Benchmark for Long-Term Vital Sign Prediction in Intraoperative Care

TL;DR

VitalBench presents a rigorous multi-center benchmark for long-term intraoperative vital sign forecasting, addressing standardization, missing data, and cross-center generalization. By leveraging VitalDB and MOVER-SIS data across three realistic tracks and employing a masked loss, it reveals that models capturing inter-variable dependencies and handling missingness directly outperform imputation-based approaches, with data scale improving generalization yet cross-center gaps persisting. The study establishes a practical, end-to-end framework that supports fair comparisons and highlights key design principles for clinically deployable predictive systems in perioperative care. Overall, VitalBench advances the field by aligning model evaluation with real-world clinical variability and operational constraints.

Abstract

Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical time-series forecasting, several challenges persist, including the lack of standardized benchmarks, incomplete data, and limited cross-center validation. To address these challenges, we introduce VitalBench, a novel benchmark specifically designed for intraoperative vital sign prediction. VitalBench includes data from over 4,000 surgeries across two independent medical centers, offering three evaluation tracks: complete data, incomplete data, and cross-center generalization. This framework reflects the real-world complexities of clinical practice, minimizing reliance on extensive preprocessing and incorporating masked loss techniques for robust and unbiased model evaluation. By providing a standardized and unified platform for model development and comparison, VitalBench enables researchers to focus on architectural innovation while ensuring consistency in data handling. This work lays the foundation for advancing predictive models for intraoperative vital sign forecasting, ensuring that these models are not only accurate but also robust and adaptable across diverse clinical environments. Our code and data are available at https://github.com/XiudingCai/VitalBench.

Paper Structure

This paper contains 16 sections, 1 equation, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Overview of the VitalBench Workflow.
  • Figure 2: Non-missing rate of variables across different datasets: VitalDB (left) and MOVER-SIS (right).
  • Figure 3: PCA visualization of representative dynamic variables across datasets. Each subplot shows the first and second principal components for a selected variable, illustrating distributional differences between VitalDB (blue) and MOVER-SIS (red).
  • Figure 4: Performance comparison of different models on datasets of varying scales.
  • Figure 5: Comparison of model performance under different data imputation strategies.
  • ...and 2 more figures