Table of Contents
Fetching ...

DiagSet: a dataset for prostate cancer histopathological image classification

Michał Koziarski, Bogusław Cyganek, Przemysław Niedziela, Bogusław Olborski, Zbigniew Antosz, Marcin Żydak, Bogdan Kwolek, Paweł Wąsowicz, Andrzej Bukała, Jakub Swadźba, Piotr Sitkowski

TL;DR

A novel histopathological dataset for prostate cancer detection is introduced and a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis is proposed, utilizing thresholding to abstain from the decision in uncertain cases.

Abstract

Cancer diseases constitute one of the most significant societal challenges. In this paper, we introduce a novel histopathological dataset for prostate cancer detection. The proposed dataset, consisting of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnoses, and 46 scans with diagnoses independently provided by a group of histopathologists can be found at https://github.com/michalkoziarski/DiagSet. Furthermore, we propose a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis, utilizing thresholding to abstain from the decision in uncertain cases. The proposed approach, composed of ensembles of deep neural networks operating on the histopathological scans at different scales, achieves 94.6% accuracy in patch-level recognition and is compared in a scan-level diagnosis with 9 human histopathologists showing high statistical agreement.

DiagSet: a dataset for prostate cancer histopathological image classification

TL;DR

A novel histopathological dataset for prostate cancer detection is introduced and a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis is proposed, utilizing thresholding to abstain from the decision in uncertain cases.

Abstract

Cancer diseases constitute one of the most significant societal challenges. In this paper, we introduce a novel histopathological dataset for prostate cancer detection. The proposed dataset, consisting of over 2.6 million tissue patches extracted from 430 fully annotated scans, 4675 scans with assigned binary diagnoses, and 46 scans with diagnoses independently provided by a group of histopathologists can be found at https://github.com/michalkoziarski/DiagSet. Furthermore, we propose a machine learning framework for detection of cancerous tissue regions and prediction of scan-level diagnosis, utilizing thresholding to abstain from the decision in uncertain cases. The proposed approach, composed of ensembles of deep neural networks operating on the histopathological scans at different scales, achieves 94.6% accuracy in patch-level recognition and is compared in a scan-level diagnosis with 9 human histopathologists showing high statistical agreement.

Paper Structure

This paper contains 15 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Randomly selected samples of image patches from DiagSet-A, extracted at different magnifications (rows) and containing different classes of tissue (columns).
  • Figure 2: Relation between the percentage of tissue classified as cancerous and ground truth diagnosis given by the histopathologist: in the full range (left) and magnified in the 0-5% range (right). Note that the logarithmic scale was used.
  • Figure 3: The impact of setting the lower and upper threshold used to give a final diagnosis on the diagnosis accuracy (left) and the proportion of classified scans (right).