Table of Contents
Fetching ...

Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains

Robin Trombetta, Olivier Rouvière, Carole Lartizien

TL;DR

The method proposed by (Kervadec et al., 2018), which introduces a size constaint loss to produce fine semantic cancer lesions segmentations from weak circle scribbles annotations, achieves on-par performance with strong fully supervised baseline models, both on in-distribution validation data and unseen test images.

Abstract

Fully supervised deep models have shown promising performance for many medical segmentation tasks. Still, the deployment of these tools in clinics is limited by the very timeconsuming collection of manually expert-annotated data. Moreover, most of the state-ofthe-art models have been trained and validated on moderately homogeneous datasets. It is known that deep learning methods are often greatly degraded by domain or label shifts and are yet to be built in such a way as to be robust to unseen data or label distributions. In the clinical setting, this problematic is particularly relevant as the deployment institutions may have different scanners or acquisition protocols than those from which the data has been collected to train the model. In this work, we propose to address these two challenges on the detection of clinically significant prostate cancer (csPCa) from bi-parametric MRI. We evaluate the method proposed by (Kervadec et al., 2018), which introduces a size constaint loss to produce fine semantic cancer lesions segmentations from weak circle scribbles annotations. Performance of the model is based on two public (PI-CAI and Prostate158) and one private databases. First, we show that the model achieves on-par performance with strong fully supervised baseline models, both on in-distribution validation data and unseen test images. Second, we observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains. This confirms the crucial need for efficient domain adaptation methods if deep learning models are aimed to be deployed in a clinical environment. Finally, we show that ensemble predictions from multiple trainings increase generalization performance.

Weakly supervised deep learning model with size constraint for prostate cancer detection in multiparametric MRI and generalization to unseen domains

TL;DR

The method proposed by (Kervadec et al., 2018), which introduces a size constaint loss to produce fine semantic cancer lesions segmentations from weak circle scribbles annotations, achieves on-par performance with strong fully supervised baseline models, both on in-distribution validation data and unseen test images.

Abstract

Fully supervised deep models have shown promising performance for many medical segmentation tasks. Still, the deployment of these tools in clinics is limited by the very timeconsuming collection of manually expert-annotated data. Moreover, most of the state-ofthe-art models have been trained and validated on moderately homogeneous datasets. It is known that deep learning methods are often greatly degraded by domain or label shifts and are yet to be built in such a way as to be robust to unseen data or label distributions. In the clinical setting, this problematic is particularly relevant as the deployment institutions may have different scanners or acquisition protocols than those from which the data has been collected to train the model. In this work, we propose to address these two challenges on the detection of clinically significant prostate cancer (csPCa) from bi-parametric MRI. We evaluate the method proposed by (Kervadec et al., 2018), which introduces a size constaint loss to produce fine semantic cancer lesions segmentations from weak circle scribbles annotations. Performance of the model is based on two public (PI-CAI and Prostate158) and one private databases. First, we show that the model achieves on-par performance with strong fully supervised baseline models, both on in-distribution validation data and unseen test images. Second, we observe a performance decrease for both fully supervised and weakly supervised models when tested on unseen data domains. This confirms the crucial need for efficient domain adaptation methods if deep learning models are aimed to be deployed in a clinical environment. Finally, we show that ensemble predictions from multiple trainings increase generalization performance.

Paper Structure

This paper contains 19 sections, 2 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Classification and detection performances of all models. Reference designates fully supervised 3D DynUNet trained and tested on the Prostate158 or private dataset in 5-fold cross-validation setup. See Appendix \ref{['app:numerical_results']} for detailed numerical values.
  • Figure 2: Relative change in performances on out-of-distribution test datasets. The reported values are the ratio between a model's performance on a test dataset (Prostate158 or our private dataset) and its cross-validation performance on PI-CAI.
  • Figure 3: Example prediction maps of several 3D models. More visual results can be found in Appendix \ref{['app:visual_results']}. Blue color is for prostate and red for clinically significant lesions.
  • Figure 4: Histograms (blue) and cumulative histograms (orange) of lesion sizes in 3D for the three datasets. The unit of lesion sizes is the number of voxels for a volume with a spatial spacing of $1 \times 1 \times 3$ mm3. The two vertical red lines show the values of the bounds $a$ and $b$ used for the CB loss (set by grid search), which are equal to 10 and 4000 respectively.
  • Figure 5: Histograms (blue) and cumulative histograms (orange) of slicewise lesion sizes (i.e. in 2D) for the three datasets. The unit of lesion sizes is the number of voxels for a volume with a spatial spacing of $1 \times 1$ mm2. The two vertical red lines show the values of the bounds $a$ and $b$ used for the CB loss (set by grid search), which are equal to 10 and 600 respectively.
  • ...and 4 more figures