Random Expert Sampling for Deep Learning Segmentation of Acute Ischemic Stroke on Non-contrast CT
Sophie Ostmeier, Brian Axelrod, Benjamin Pulli, Benjamin F. J. Verhaaren, Abdelkader Mahammedi, Yongkai Liu, Christian Federau, Greg Zaharchuk, Jeremy J. Heit
TL;DR
This study addresses automatic delineation of the ischemic core on non-contrast CT to aid acute stroke triage. It introduces random expert sampling as a training scheme for a benchmark U-Net, trained on three neuroradiologists’ annotations from the DEFUSE 3 cohort and compared to majority voting and inter-expert Agreement. Random expert sampling achieves higher agreement with experts than they have among themselves and yields ischemic core volumes that correlate with final infarct volumes and clinical outcomes, performing comparably to CT perfusion in some respects. The approach has potential to enable accurate, reliable NCCT-based triage in less specialized hospitals, potentially expanding access to endovascular therapy by reducing dependence on perfusion imaging, with methodological support provided by bootstrap and cross-validation analyses and supplementary theoretical derivations for the training loss.
Abstract
Purpose: Multi-expert deep learning training methods to automatically quantify ischemic brain tissue on Non-Contrast CT Materials and Methods: The data set consisted of 260 Non-Contrast CTs from 233 patients of acute ischemic stroke patients recruited in the DEFUSE 3 trial. A benchmark U-Net was trained on the reference annotations of three experienced neuroradiologists to segment ischemic brain tissue using majority vote and random expert sampling training schemes. We used a one-sided Wilcoxon signed-rank test on a set of segmentation metrics to compare bootstrapped point estimates of the training schemes with the inter-expert agreement and ratio of variance for consistency analysis. We further compare volumes with the 24h-follow-up DWI (final infarct core) in the patient subgroup with full reperfusion and we test volumes for correlation to the clinical outcome (mRS after 30 and 90 days) with the Spearman method. Results: Random expert sampling leads to a model that shows better agreement with experts than experts agree among themselves and better agreement than the agreement between experts and a majority-vote model performance (Surface Dice at Tolerance 5mm improvement of 61% to 0.70 +- 0.03 and Dice improvement of 25% to 0.50 +- 0.04). The model-based predicted volume similarly estimated the final infarct volume and correlated better to the clinical outcome than CT perfusion. Conclusion: A model trained on random expert sampling can identify the presence and location of acute ischemic brain tissue on Non-Contrast CT similar to CT perfusion and with better consistency than experts. This may further secure the selection of patients eligible for endovascular treatment in less specialized hospitals.
