Table of Contents
Fetching ...

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Pedro C. Neto, Diana Montezuma, Sara P. Oliveira, Domingos Oliveira, João Fraga, Ana Monteiro, João Monteiro, Liliana Ribeiro, Sofia Gonçalves, Stefan Reinhard, Inti Zlobec, Isabel M. Pinto, Jaime S. Cardoso

TL;DR

A deep learning system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation are proposed.

Abstract

Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

TL;DR

A deep learning system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation are proposed.

Abstract

Considering the profound transformation affecting pathology practice, we aimed to develop a scalable artificial intelligence (AI) system to diagnose colorectal cancer from whole-slide images (WSI). For this, we propose a deep learning (DL) system that learns from weak labels, a sampling strategy that reduces the number of training samples by a factor of six without compromising performance, an approach to leverage a small subset of fully annotated samples, and a prototype with explainable predictions, active learning features and parallelisation. Noting some problems in the literature, this study is conducted with one of the largest WSI colorectal samples dataset with approximately 10,500 WSIs. Of these samples, 900 are testing samples. Furthermore, the robustness of the proposed method is assessed with two additional external datasets (TCGA and PAIP) and a dataset of samples collected directly from the proposed prototype. Our proposed method predicts, for the patch-based tiles, a class based on the severity of the dysplasia and uses that information to classify the whole slide. It is trained with an interpretable mixed-supervision scheme to leverage the domain knowledge introduced by pathologists through spatial annotations. The mixed-supervision scheme allowed for an intelligent sampling strategy effectively evaluated in several different scenarios without compromising the performance. On the internal dataset, the method shows an accuracy of 93.44% and a sensitivity between positive (low-grade and high-grade dysplasia) and non-neoplastic samples of 0.996. On the external test samples varied with TCGA being the most challenging dataset with an overall accuracy of 84.91% and a sensitivity of 0.996.
Paper Structure (25 sections, 6 equations, 11 figures, 10 tables)

This paper contains 25 sections, 6 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: Tile sampling impact on information loss: percentage of tiles not selected due to sampling with different thresholds, over the first four inference epochs. The blue bar represents a sampling strategy that retains 200 tiles per slide, the orange bar is for a strategy that retains 100 tiles, the green bar represents a strategy that retains 75 tiles and finally the strategy represented by the red line retains 50 tiles per slide.
  • Figure 2: Precision-recall curve on the on the CRS10K test set: For the three distinct models, we have calculated the Precision-recall curve on this dataset. Includes an indication of the F1-Score for each of the different models. The blue line represents the curve of Our method when trained on CRS10K, while the orange line shows the same method when trained on CRS4K. The green line is the curve of iMIL4Path.
  • Figure 3: Confidence analysis for correct and incorrect predictions on the CRS10K test set: Kernel density estimation of the confidences of correct and incorrect predictions performed on the three-class classification problem by three distinct models on the CRS10K test set. The plots represent, from left to right, the proposed method trained on CRS10K, the proposed method trained on CRS4K and iMIL4Path. In each plot, the blue line defines the density function of the correct samples and the blue dashed line the mean confidence of those samples. On the other hand, the orange solid and dashed lines represent the same for incorrect predictions.
  • Figure 4: Confidence analysis for correct and incorrect predictions on the Prototype set: Kernel density estimation of the confidences of correct and incorrect predictions performed on the three-class classification problem by three distinct models on the prototype set. The plots represent, from left to right, the proposed method trained on CRS10K, the proposed method trained on CRS4K and iMIL4Path. In each plot, the blue line defines the density function of the correct samples and the blue dashed line the mean confidence of those samples. On the other hand, the orange solid and dashed lines represent the same for incorrect predictions.
  • Figure 5: Accuracy-vs-Rejection-rate for the models evaluated on the CRS10K test set. Relation between the accuracy and the percentage of samples not classified by the model. Both axes are in percentage. The blue line represents Our method when trained on CRS10K, while the orange line shows the same method when trained on CRS4K. The green line is for iMIL4Path
  • ...and 6 more figures