Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists

Sophie Ostmeier; Brian Axelrod; Benjamin F. J. Verhaaren; Soren Christensen; Abdelkader Mahammedi; Yongkai Liu; Benjamin Pulli; Li-Jia Li; Greg Zaharchuk; Jeremy J. Heit

Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists

Sophie Ostmeier, Brian Axelrod, Benjamin F. J. Verhaaren, Soren Christensen, Abdelkader Mahammedi, Yongkai Liu, Benjamin Pulli, Li-Jia Li, Greg Zaharchuk, Jeremy J. Heit

TL;DR

It is determined if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists with an accuracy comparable to neuroradiologists.

Abstract

To determine if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3ml, and 3mm. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46+-0.09 Surface Dice at Tolerance 5mm and 0.47+-0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, p < 0.05. The CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.

Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists

TL;DR

Abstract

Paper Structure (17 sections, 4 figures, 2 tables)

This paper contains 17 sections, 4 figures, 2 tables.

Introduction
Results
Discussion
Conclusion
Methods
Author contributions statement
Competing Interest

Figures (4)

Figure 1: Scatter plots of Volume Agreement between Experts and Model on Test set Top row: Top row: Inter-Expert and Model-Expert Agreement for expert B, Bottom row: Top row: Inter-Expert and Model-Expert Agreement for expert C, R= Spearman's Correlation Coefficient, Gray Area = 95% confidence region, Black dots = individual data points. The gray areas are smaller in the model-expert comparisons (rightmost column) indicating a lower variance for the predicted volumes.
Figure 2: Qualitative analyses of experts A, B, and C and the Prediction of the Model. Patient 1 (left): higher quality NCCT Patient 2 (right): lower quality NCCT. Experts A, B, and C agree on the location and volume of the stroke. The model prediction (last row) agrees as well with the test experts B and C as with the training expert A .
Figure 3: Flowchart of data partition in the training and test set. The training set was further partitioned for 5-fold cross-validation to determine the best model configurations. We then used the highest-performing fold based on the Surface Dice at Tolerance 5mm for the final model.
Figure 4: Set up of Analysis 1. Training: A model was trained on training expert A (please see supplementary information for training on experts B and C). 2. Test: The prediction of the model was compared to test experts B and C and tested for non-inferiority.

Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists

TL;DR

Abstract

Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists

Authors

TL;DR

Abstract

Table of Contents

Figures (4)