Transformer representation learning is necessary for dynamic multi-modal physiological data on small-cohort patients
Bingxu Wang, Min Ge, Kunzhi Cai, Yuqi Zhang, Zeyi Zhou, Wenjiao Li, Yachong Guo, Wei Wang, Qing Zhou
TL;DR
This study tackles early Detection of postoperative delirium (POD) in a small, multi-modal ICU cohort by introducing a Transformer-based representation learning framework tailored to dynamic physiological signals. The proposed Fusion Pathformer, an adaptation of Pathformer for multi-modal data with a TrendLoss regularizer, learns unified temporal representations that markedly improve POD prediction over traditional classifiers, achieving AUROC values above $0.95$ for POD$\_2$ and POD$\_3$ in TYPE I patients. TYPE II results indicate temporal-dimension transformers can enhance sensitivity and specificity when inter-modal relations are less informative, while patch-based multimodal designs show trade-offs between sensitivity and specificity. The work highlights the critical role of representation learning for multi-modal medical time series in POD diagnosis, while acknowledging limitations from small sample size and data heterogeneity, and calls for public, larger-scale datasets to enable robust clinical deployment.
Abstract
Postoperative delirium (POD), a severe neuropsychiatric complication affecting nearly 50% of high-risk surgical patients, is defined as an acute disorder of attention and cognition, It remains significantly underdiagnosed in the intensive care units (ICUs) due to subjective monitoring methods. Early and accurate diagnosis of POD is critical and achievable. Here, we propose a POD prediction framework comprising a Transformer representation model followed by traditional machine learning algorithms. Our approaches utilizes multi-modal physiological data, including amplitude-integrated electroencephalography (aEEG), vital signs, electrocardiographic monitor data as well as hemodynamic parameters. We curated the first multi-modal POD dataset encompassing two patient types and evaluated the various Transformer architectures for representation learning. Empirical results indicate a consistent improvements of sensitivity and Youden index in patient TYPE I using Transformer representations, particularly our fusion adaptation of Pathformer. By enabling effective delirium diagnosis from postoperative day 1 to 3, our extensive experimental findings emphasize the potential of multi-modal physiological data and highlight the necessity of representation learning via multi-modal Transformer architecture in clinical diagnosis.
