The Foundational Capabilities of Large Language Models in Predicting Postoperative Risks Using Clinical Notes
Charles Alba, Bing Xue, Joanna Abraham, Thomas Kannampallil, Chenyang Lu
TL;DR
This study investigates whether preoperative clinical notes harbor predictive signals for six postoperative risks and whether large language models can leverage that information. By comparing clinically oriented pretrained LLMs to traditional word embeddings and applying progressively stronger fine-tuning strategies—self-supervised, label-informed, and a multi-task foundation approach—the authors demonstrate substantial performance gains, with AUROC improving by up to 3.6% and AUPRC by up to 2.6% using a unified foundation model. The work also shows that the benefits generalize beyond a single center, including replication on MIMIC-III, and provides qualitative safety analyses to support clinical applicability. Overall, the findings support foundational capabilities of LLMs in perioperative risk prediction from notes and highlight practical pathways for deploying a single, multi-task model in perioperative care while noting limitations and areas for future validation.
Abstract
Clinical notes recorded during a patient's perioperative journey holds immense informational value. Advances in large language models (LLMs) offer opportunities for bridging this gap. Using 84,875 pre-operative notes and its associated surgical cases from 2018 to 2021, we examine the performance of LLMs in predicting six postoperative risks using various fine-tuning strategies. Pretrained LLMs outperformed traditional word embeddings by an absolute AUROC of 38.3% and AUPRC of 33.2%. Self-supervised fine-tuning further improved performance by 3.2% and 1.5%. Incorporating labels into training further increased AUROC by 1.8% and AUPRC by 2%. The highest performance was achieved with a unified foundation model, with improvements of 3.6% for AUROC and 2.6% for AUPRC compared to self-supervision, highlighting the foundational capabilities of LLMs in predicting postoperative risks, which could be potentially beneficial when deployed for perioperative care
