Table of Contents
Fetching ...

Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions

Ruben T. Lucassen, Nikolas Stathonikos, Gerben E. Breimer, Mitko Veta, Willeke A. M. Blokx

TL;DR

The AI model achieved a strong predictive performance in differentiating between cutaneous melanocytic lesions of high and low complexity, and the improvement in workflow efficiency due to AI-based triaging could be substantial.

Abstract

Pathologists are facing an increasing workload due to a growing volume of cases and the need for more comprehensive diagnoses. Aiming to facilitate workload reduction and faster turnaround times, we developed an artificial intelligence (AI) model for triaging cutaneous melanocytic lesions based on whole slide images. The AI model was developed and validated using a retrospective cohort from the UMC Utrecht. The dataset consisted of 52,202 whole slide images from 27,167 unique specimens, acquired from 20,707 patients. Specimens with only common nevi were assigned to the low complexity category (86.6%). In contrast, specimens with any other melanocytic lesion subtype, including non-common nevi, melanocytomas, and melanomas, were assigned to the high complexity category (13.4%). The dataset was split on patient level into a development set (80%) and test sets (20%) for independent evaluation. Predictive performance was primarily measured using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation experiment was performed to study the effect of implementing AI-based triaging in the clinic. The AI model reached an AUROC of 0.966 (95% CI, 0.960-0.972) and an AUPRC of 0.857 (95% CI, 0.836-0.877) on the in-distribution test set, and an AUROC of 0.899 (95% CI, 0.860-0.934) and an AUPRC of 0.498 (95% CI, 0.360-0.639) on the out-of-distribution test set. In the simulation experiment, using random case assignment as baseline, AI-based triaging prevented an average of 43.9 (95% CI, 36-55) initial examinations of high complexity cases by general pathologists for every 500 cases. In conclusion, the AI model achieved a strong predictive performance in differentiating between cutaneous melanocytic lesions of high and low complexity. The improvement in workflow efficiency due to AI-based triaging could be substantial.

Artificial Intelligence-Based Triaging of Cutaneous Melanocytic Lesions

TL;DR

The AI model achieved a strong predictive performance in differentiating between cutaneous melanocytic lesions of high and low complexity, and the improvement in workflow efficiency due to AI-based triaging could be substantial.

Abstract

Pathologists are facing an increasing workload due to a growing volume of cases and the need for more comprehensive diagnoses. Aiming to facilitate workload reduction and faster turnaround times, we developed an artificial intelligence (AI) model for triaging cutaneous melanocytic lesions based on whole slide images. The AI model was developed and validated using a retrospective cohort from the UMC Utrecht. The dataset consisted of 52,202 whole slide images from 27,167 unique specimens, acquired from 20,707 patients. Specimens with only common nevi were assigned to the low complexity category (86.6%). In contrast, specimens with any other melanocytic lesion subtype, including non-common nevi, melanocytomas, and melanomas, were assigned to the high complexity category (13.4%). The dataset was split on patient level into a development set (80%) and test sets (20%) for independent evaluation. Predictive performance was primarily measured using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). A simulation experiment was performed to study the effect of implementing AI-based triaging in the clinic. The AI model reached an AUROC of 0.966 (95% CI, 0.960-0.972) and an AUPRC of 0.857 (95% CI, 0.836-0.877) on the in-distribution test set, and an AUROC of 0.899 (95% CI, 0.860-0.934) and an AUPRC of 0.498 (95% CI, 0.360-0.639) on the out-of-distribution test set. In the simulation experiment, using random case assignment as baseline, AI-based triaging prevented an average of 43.9 (95% CI, 36-55) initial examinations of high complexity cases by general pathologists for every 500 cases. In conclusion, the AI model achieved a strong predictive performance in differentiating between cutaneous melanocytic lesions of high and low complexity. The improvement in workflow efficiency due to AI-based triaging could be substantial.

Paper Structure

This paper contains 19 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Predictive performance of the AI model on the in-distribution test set. (a) Receiver operating characteristic curve. (b) Precision-recall curve. (c) Reliability diagram. (d) Predicted probability histogram.
  • Figure 1: Predictive performance of the AI model on the in-distribution test set partitioned per scanner period. The test dataset was partitioned into the subset of cases for which slides were scanned using the Aperio ScanScope XT scanner (2013-2015) and the subset of cases for which slides were scanned using the Hamamatsu Nanozoomer 2.0-XR scanner (2016-2020). (a) Receiver operating characteristic curve. (b) Precision-recall curve.
  • Figure 2: Predictive performance of the AI model on the out-of-distribution test set. (a) Receiver operating characteristic curve. (b) Precision-recall curve. (c) Reliability diagram. (d) Predicted probability histogram.
  • Figure 3: Example cases from the test set. Per case from top to bottom: the most representative whole slide image for that case, the extracted tiles colored based on the attention weights assigned by the AI model, the tile with the largest attention weight at a higher magnification, and the classification result. Classification decisions were obtained using a threshold corresponding to a sensitivity of 0.95 on the in-distribution test set. Images for cases shown in the two leftmost columns were acquired using the ScanScope XT scanner (Aperio) and in the two rightmost columns using the Nanozoomer 2·0-XR scanner (Hamamatsu). (a) Correct predictions for cases from the low complexity category. From left to right: dermal nevus, compound nevus, dermal nevus, and (acral) junctional nevus. (b). Correct predictions for cases from the high complexity category. From left to right: superficial spreading melanoma, nodular melanoma, WNT-activated melanocytoma, and Spitz nevus. (c) Incorrect predictions. From left to right: dermal nevus and squamous cell carcinoma (out-of-distribution), compound nevus and scar tissue, dermal nevus with uncommon morphology (heavily pigmented and likely congenital), and WNT-activated melanocytoma.
  • Figure 4: Specimen selection.
  • ...and 1 more figures