Table of Contents
Fetching ...

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

Pieter Van Leemput, Johannes Keustermans, Wouter Mollemans

TL;DR

The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Abstract

This article describes the clinical validation study setup, statistical analysis and results for a deep learning algorithm which detects dental anomalies in intraoral radiographic images, more specifically caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss and calculus. The study compares the detection performance of dentists using the deep learning algorithm to the prior performance of these dentists evaluating the images without algorithmic assistance. Calculating the marginal profit and loss of performance from the annotated paired image data allows for a quantification of the hypothesized change in sensitivity and specificity. The statistical significance of these results is extensively proven using both McNemar's test and the binomial hypothesis test. The average sensitivity increases from $60.7\%$ to $85.9\%$, while the average specificity slightly decreases from $94.5\%$ to $92.7\%$. We prove that the increase of the area under the localization ROC curve (AUC) is significant (from $0.60$ to $0.86$ on average), while the average AUC is bounded by the $95\%$ confidence intervals ${[}0.54, 0.65{]}$ and ${[}0.82, 0.90{]}$. When using the deep learning algorithm for diagnostic guidance, the dentist can be $95\%$ confident that the average true population sensitivity is bounded by the range $79.6\%$ to $91.9\%$. The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Statistical validation of a deep learning algorithm for dental anomaly detection in intraoral radiographs using paired data

TL;DR

The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.

Abstract

This article describes the clinical validation study setup, statistical analysis and results for a deep learning algorithm which detects dental anomalies in intraoral radiographic images, more specifically caries, apical lesions, root canal treatment defects, marginal defects at crown restorations, periodontal bone loss and calculus. The study compares the detection performance of dentists using the deep learning algorithm to the prior performance of these dentists evaluating the images without algorithmic assistance. Calculating the marginal profit and loss of performance from the annotated paired image data allows for a quantification of the hypothesized change in sensitivity and specificity. The statistical significance of these results is extensively proven using both McNemar's test and the binomial hypothesis test. The average sensitivity increases from to , while the average specificity slightly decreases from to . We prove that the increase of the area under the localization ROC curve (AUC) is significant (from to on average), while the average AUC is bounded by the confidence intervals and . When using the deep learning algorithm for diagnostic guidance, the dentist can be confident that the average true population sensitivity is bounded by the range to . The proposed paired data setup and statistical analysis can be used as a blueprint to thoroughly test the effect of a modality change, like a deep learning based detection and/or segmentation, on radiographic images.
Paper Structure (21 sections, 17 equations, 2 figures, 9 tables)

This paper contains 21 sections, 17 equations, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Annotation setup for the control arm. The original IOR input image is shown to the left, while its manually annotated counterpart is shown on the right. The color coding of the annotated bounding boxes is as follows: Caries (red), bone loss (dark blue), marginal defect (green), root canal treatment defect (yellow), calculus (light blue) and apical lesion (orange).
  • Figure 2: The LROC curves derived from the tooth-based validation results in the control and study arm for each of the six anomaly types. A selection of the confidence labels $k$ next to their corresponding discrete operating points $o_{k}$ is visualized for clarity.