Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

Pieter Van Leemput; Johannes Keustermans; Wouter Mollemans

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

Pieter Van Leemput, Johannes Keustermans, Wouter Mollemans

TL;DR

The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Abstract

This article describes the clinical validation study setup, statistical analysis and results for a deep learning algorithm which detects dental anomalies in intraoral radiographic images, more specifically caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss and calculus. The study compares the detection performance of dentists using the deep learning algorithm to the prior performance of these dentists evaluating the images without algorithmic assistance. Calculating the marginal profit and loss of performance from the annotated paired image data allows for a quantification of the hypothesized change in sensitivity and specificity. The statistical significance of these results is extensively proven using both McNemar's test and the binomial hypothesis test. The average sensitivity increases from $60.7\%$ to $85.9\%$, while the average specificity slightly decreases from $94.5\%$ to $92.7\%$. We prove that the increase of the area under the localization ROC curve (AUC) is significant (from $0.60$ to $0.86$ on average), while the average AUC is bounded by the $95\%$ confidence intervals ${[}0.54, 0.65{]}$ and ${[}0.82, 0.90{]}$. When using the deep learning algorithm for diagnostic guidance, the dentist can be $95\%$ confident that the average true population sensitivity is bounded by the range $79.6\%$ to $91.9\%$. The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

TL;DR

Abstract

, while the average specificity slightly decreases from

. We prove that the increase of the area under the localization ROC curve (AUC) is significant (from

on average), while the average AUC is bounded by the

confidence intervals

and

. When using the deep learning algorithm for diagnostic guidance, the dentist can be

confident that the average true population sensitivity is bounded by the range

. The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Paper Structure (21 sections, 17 equations, 2 figures, 9 tables)

This paper contains 21 sections, 17 equations, 2 figures, 9 tables.

Introduction
Materials
The deep learning network
Validation data
Control and study setup
Classification
Instance-based classification
Tooth-based classification
Performance measures
Sensitivity and specificity
Matched sample tables
ROC curves
Statistical analysis
Hypothesis testing
One-sided McNemar's test
...and 6 more sections

Figures (2)

Figure 1: Annotation setup for the control arm. The original IOR input image is shown to the left, while its manually annotated counterpart is shown on the right. The color coding of the annotated bounding boxes is as follows: Caries (red), bone loss (dark blue), marginal defect (green), root canal treatment defect (yellow), calculus (light blue) and apical lesion (orange).
Figure 2: The LROC curves derived from the tooth-based validation results in the control and study arm for each of the six anomaly types. A selection of the confidence labels $k$ next to their corresponding discrete operating points $o_{k}$ is visualized for clarity.

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

TL;DR

Abstract

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

Authors

TL;DR

Abstract

Table of Contents

Figures (2)