Table of Contents
Fetching ...

Empirical Validation of Conformal Prediction for Trustworthy Skin Lesions Classification

Jamil Fayyad, Shadi Alijani, Homayoun Najjaran

TL;DR

This study developed Conformal Prediction, Monte Carlo Dropout, and Evidential Deep Learning approaches to assess uncertainty quantification in deep neural networks, highlighting a robust and consistent performance of conformal prediction across diverse testing conditions.

Abstract

Background and objective: Uncertainty quantification is a pivotal field that contributes to realizing reliable and robust systems. It becomes instrumental in fortifying safe decisions by providing complementary information, particularly within high-risk applications. existing studies have explored various methods that often operate under specific assumptions or necessitate substantial modifications to the network architecture to effectively account for uncertainties. The objective of this paper is to study Conformal Prediction, an emerging distribution-free uncertainty quantification technique, and provide a comprehensive understanding of the advantages and limitations inherent in various methods within the medical imaging field. Methods: In this study, we developed Conformal Prediction, Monte Carlo Dropout, and Evidential Deep Learning approaches to assess uncertainty quantification in deep neural networks. The effectiveness of these methods is evaluated using three public medical imaging datasets focused on detecting pigmented skin lesions and blood cell types. Results: The experimental results demonstrate a significant enhancement in uncertainty quantification with the utilization of the Conformal Prediction method, surpassing the performance of the other two methods. Furthermore, the results present insights into the effectiveness of each uncertainty method in handling Out-of-Distribution samples from domain-shifted datasets. Our code is available at: Conclusions: Our conclusion highlights a robust and consistent performance of conformal prediction across diverse testing conditions. This positions it as the preferred choice for decision-making in safety-critical applications.

Empirical Validation of Conformal Prediction for Trustworthy Skin Lesions Classification

TL;DR

This study developed Conformal Prediction, Monte Carlo Dropout, and Evidential Deep Learning approaches to assess uncertainty quantification in deep neural networks, highlighting a robust and consistent performance of conformal prediction across diverse testing conditions.

Abstract

Background and objective: Uncertainty quantification is a pivotal field that contributes to realizing reliable and robust systems. It becomes instrumental in fortifying safe decisions by providing complementary information, particularly within high-risk applications. existing studies have explored various methods that often operate under specific assumptions or necessitate substantial modifications to the network architecture to effectively account for uncertainties. The objective of this paper is to study Conformal Prediction, an emerging distribution-free uncertainty quantification technique, and provide a comprehensive understanding of the advantages and limitations inherent in various methods within the medical imaging field. Methods: In this study, we developed Conformal Prediction, Monte Carlo Dropout, and Evidential Deep Learning approaches to assess uncertainty quantification in deep neural networks. The effectiveness of these methods is evaluated using three public medical imaging datasets focused on detecting pigmented skin lesions and blood cell types. Results: The experimental results demonstrate a significant enhancement in uncertainty quantification with the utilization of the Conformal Prediction method, surpassing the performance of the other two methods. Furthermore, the results present insights into the effectiveness of each uncertainty method in handling Out-of-Distribution samples from domain-shifted datasets. Our code is available at: Conclusions: Our conclusion highlights a robust and consistent performance of conformal prediction across diverse testing conditions. This positions it as the preferred choice for decision-making in safety-critical applications.
Paper Structure (18 sections, 11 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 18 sections, 11 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Conformal Prediction Framework. The dataset is split into a training set, a calibration set, and a testing set. Conformal Prediction utilizes the scoring function and calibration set to generate a quantile value $\hat{q}$. At inference, a test sample is passed to the train network, and the output score along with the quantile value is used to generate prediction sets.
  • Figure 2: The histogram distribution of uncertainty values assigned by each uncertainty quantification algorithm to both correctly and wrongly classified samples. Results illustrate that while both MCD and EDL assign diverse values of uncertainties to wrongly classified samples, CP, on the other hand, assigns high uncertainty values to those samples.
  • Figure 3: Histogram of uncertainty values assigned to correctly classified samples and wrongly classified samples using 2 different deep learning models: ResNet-50 and VGG-11. The three uncertainty quantification methods are MCDropout, EDL, and CP
  • Figure 4: Analysis of samples with corresponding uncertainty values. Samples are categorized into correctly classified and wrongly classified groups, with their uncertainty quantified using the conformal prediction algorithm. Misclassified samples often exhibit a noticeably different skin tone compared to correctly classified ones. Additionally, misclassified samples with low uncertainty display more visible skin lesion patches compared to those with high uncertainty, possibly influencing the model's predictive confidence.
  • Figure 5: Comparing the effect of APS Vs. RAPS on uncertainty assignments for both correctly classified samples and misclassified samples. The figure illustrates that smaller prediction set sizes produced by RAPS, often indicating lower uncertainty, are generally preferred. However, challenges arise in cases of misclassified samples where larger set sizes, indicating increased uncertainty, may incur penalties from the regularization term
  • ...and 4 more figures