TranSOP: Transformer-based Multimodal Classification for Stroke Treatment Outcome Prediction
Zeynel A. Samak, Philip Clatworthy, Majid Mirmehdi
TL;DR
This work tackles predicting 90-day functional outcome after acute ischemic stroke treatment using baseline 3D NCCT scans and clinical data. It introduces TranSOP, a transformer-based multimodal architecture that embeds NCCT patches, encodes clinical metadata, and fuses the modalities to predict dichotomized $mRS$ scores with good outcome defined as $mRS \leq 2$. Evaluated on the MR CLEAN dataset with 500 patients, TranSOP variants—especially the Swin Transformer variant—achieve leading AUCs (up to 0.85) when clinical data are included, outperforming CNN baselines in multimodal settings. The results demonstrate transformers’ ability to model long-range structure in 3D medical images and effectively integrate heterogeneous data, with potential to improve treatment decisions in acute stroke care.
Abstract
Acute ischaemic stroke, caused by an interruption in blood flow to brain tissue, is a leading cause of disability and mortality worldwide. The selection of patients for the most optimal ischaemic stroke treatment is a crucial step for a successful outcome, as the effect of treatment highly depends on the time to treatment. We propose a transformer-based multimodal network (TranSOP) for a classification approach that employs clinical metadata and imaging information, acquired on hospital admission, to predict the functional outcome of stroke treatment based on the modified Rankin Scale (mRS). This includes a fusion module to efficiently combine 3D non-contrast computed tomography (NCCT) features and clinical information. In comparative experiments using unimodal and multimodal data on the MRCLEAN dataset, we achieve a state-of-the-art AUC score of 0.85.
