Prediction of Thrombectomy Functional Outcomes using Multimodal Data
Zeynel A. Samak, Philip Clatworthy, Majid Mirmehdi
TL;DR
This work tackles predicting functional outcomes after endovascular thrombectomy (EVT) by leveraging multimodal data, combining baseline NCCT imaging with clinical metadata. It introduces a CNN architecture with an Image Feature Encoding (IFE) block using channel-wise and spatial attention (cSE and sSE) and an Image-Metadata Fusion (IMF) block to jointly process imaging and metadata, trained with focal loss to mitigate class imbalance. On the MR CLEAN dataset, the multimodal model outperforms unimodal baselines, achieving 0.75 AUC for the dichotomous mRS and 0.35 accuracy for seven-class mRS, with notable gains in F1-score. The findings support multimodal fusion with attention as a powerful approach for EVT outcome prediction and point to future refinements such as adaptive fusion and temporal imaging analysis to further improve predictions.
Abstract
Recent randomised clinical trials have shown that patients with ischaemic stroke {due to occlusion of a large intracranial blood vessel} benefit from endovascular thrombectomy. However, predicting outcome of treatment in an individual patient remains a challenge. We propose a novel deep learning approach to directly exploit multimodal data (clinical metadata information, imaging data, and imaging biomarkers extracted from images) to estimate the success of endovascular treatment. We incorporate an attention mechanism in our architecture to model global feature inter-dependencies, both channel-wise and spatially. We perform comparative experiments using unimodal and multimodal data, to predict functional outcome (modified Rankin Scale score, mRS) and achieve 0.75 AUC for dichotomised mRS scores and 0.35 classification accuracy for individual mRS scores.
