VitalBench: A Rigorous Multi-Center Benchmark for Long-Term Vital Sign Prediction in Intraoperative Care
Xiuding Cai, Xueyao Wang, Sen Wang, Yaoyao Zhu, Jiao Chen, Yu Yao
TL;DR
VitalBench presents a rigorous multi-center benchmark for long-term intraoperative vital sign forecasting, addressing standardization, missing data, and cross-center generalization. By leveraging VitalDB and MOVER-SIS data across three realistic tracks and employing a masked loss, it reveals that models capturing inter-variable dependencies and handling missingness directly outperform imputation-based approaches, with data scale improving generalization yet cross-center gaps persisting. The study establishes a practical, end-to-end framework that supports fair comparisons and highlights key design principles for clinically deployable predictive systems in perioperative care. Overall, VitalBench advances the field by aligning model evaluation with real-world clinical variability and operational constraints.
Abstract
Intraoperative monitoring and prediction of vital signs are critical for ensuring patient safety and improving surgical outcomes. Despite recent advances in deep learning models for medical time-series forecasting, several challenges persist, including the lack of standardized benchmarks, incomplete data, and limited cross-center validation. To address these challenges, we introduce VitalBench, a novel benchmark specifically designed for intraoperative vital sign prediction. VitalBench includes data from over 4,000 surgeries across two independent medical centers, offering three evaluation tracks: complete data, incomplete data, and cross-center generalization. This framework reflects the real-world complexities of clinical practice, minimizing reliance on extensive preprocessing and incorporating masked loss techniques for robust and unbiased model evaluation. By providing a standardized and unified platform for model development and comparison, VitalBench enables researchers to focus on architectural innovation while ensuring consistency in data handling. This work lays the foundation for advancing predictive models for intraoperative vital sign forecasting, ensuring that these models are not only accurate but also robust and adaptable across diverse clinical environments. Our code and data are available at https://github.com/XiudingCai/VitalBench.
