Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

Cooper Gamble; Shahriar Faghani; Bradley J. Erickson

Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

Cooper Gamble, Shahriar Faghani, Bradley J. Erickson

TL;DR

This study addresses the trust gap in deep learning for radiology by applying Mondrian conformal prediction (MCP) to intracranial hemorrhage (ICH) detection on head CTs. A YOLOv8-based detector was trained on definite, radiologist-consensus data, calibrated with MCP, and evaluated across test, challenging, negative control, and RSNA external validation sets; MCP yielded statistically guaranteed prediction sets and an impressive 99.7% accuracy in identifying challenging cases, while maintaining competitive detection performance. The approach demonstrates that uncertainty-aware DL can both match state-of-the-art accuracy and flag uncertain inputs for expert review, advancing practical deployment in radiology workflows. The work also provides an open, deployable MCP toolkit and suggests future extensions to 3D models and broader validation to support clinically trustworthy AI adoption.

Abstract

As deep learning (DL) continues to demonstrate its ability in radiological tasks, it is critical that we optimize clinical DL solutions to include safety. One of the principal concerns in the clinical adoption of DL tools is trust. This study aims to apply conformal prediction as a step toward trustworthiness for DL in radiology. This is a retrospective study of 491 non-contrast head CTs from the CQ500 dataset, in which three senior radiologists annotated slices containing intracranial hemorrhage (ICH). The dataset was split into definite and challenging subsets, where challenging images were defined to those in which there was disagreement among readers. A DL model was trained on 146 patients (10,815 slices) from the definite data (training dataset) to perform ICH localization and classification for five classes of ICH. To develop an uncertainty-aware DL model, 1,546 cases of the definite data (calibration dataset) was used for Mondrian conformal prediction (MCP). The uncertainty-aware DL model was tested on 8,401 definite and challenging cases to assess its ability to identify challenging cases. After the MCP procedure, the model achieved an F1 score of 0.920 for ICH classification on the test dataset. Additionally, it correctly identified 6,837 of the 6,856 total challenging cases as challenging (99.7% accuracy). It did not incorrectly label any definite cases as challenging. The uncertainty-aware ICH detector performs on par with state-of-the-art models. MCP's performance in detecting challenging cases demonstrates that it is useful in automated ICH detection and promising for trustworthiness in radiological DL.

Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

TL;DR

Abstract

Paper Structure (17 sections, 7 figures, 6 tables)

This paper contains 17 sections, 7 figures, 6 tables.

Introduction
Materials and Methods
Data Collection
Data Organization
Image Processing
Model Development
Model Calibration
Model Performance Evaluation
Trustworthiness Evaluation
Statistical Analysis
Results
Dataset Characteristics
Model Development Cost
Model Performance: Hemorrhage Detection
Model Performance: Identification of Challenging Cases
...and 2 more sections

Figures (7)

Figure 1: High-level overview of Mondrian conformal prediction as it is applied in this study. IPH = intraparenchymal hemorrhage, IVH = intraventricular hemorrhage, HNU = heuristic notion of uncertainty, i = insertion index in sorted value array, p = conformal score.
Figure 2: Algorithm for clustering predictions with IoU thresholding and non-maximum suppression. IPH = intraparenchymal hemorrhage, IVH = intraventricular hemorrhage, SDH = subdural hemorrhage, EDH = epidural hemorrhage, SAH = subarachnoid hemorrhage, C1 = Cluster 1, C2 = Cluster 2, C3 = Cluster 3, C4 = Cluster 4, IoU = intersection over union.
Figure 3: Textual descriptions of TP, FP, TN, and FN for the three confusion matrices used during model evaluation. TP = true positive, FP = false positive, TN = true negative, FN = false negative, IoU = intersection over union.
Figure 4: Visual depictions of examples of TP, FP, TN, and FN for the three confusion matrices used during model evaluation. TP = true positive, FP = false positive, TN = true negative, FN = false negative, IPH = intraparenchymal hemorrhage, IVH = intraventricular hemorrhage, SDH = subdural hemorrhage, EDH = epidural hemorrhage, SAH = subarachnoid hemorrhage.
Figure 5: Textual descriptions of TP, FP, TN, and FN for the two confusion matrices used during external validation of the model. TP = true positive, FP = false positive, TN = true negative, FN = false negative.
...and 2 more figures

Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

TL;DR

Abstract

Toward Clinically Trustworthy Deep Learning: Applying Conformal Prediction to Intracranial Hemorrhage Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (7)