Table of Contents
Fetching ...

Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

Mehran Zoravar, Shadi Alijani, Homayoun Najjaran

TL;DR

Problem: domain shift reduces trustworthiness of skin-lesion classifiers in real-world deployment. Approach: Conformal Ensemble of Vision Transformers (CE-ViTs) ensembles ViTs trained on HAM10000, Dermofit, and Skin Cancer ISIC and applies conformal prediction to produce calibrated prediction sets with uncertainty estimates. Key results: CE-ViTs achieves a coverage of 90.38% and increases the average prediction-set size for misclassified samples to $3.075$, while correct-prediction sets average around $2.74$, indicating robust uncertainty handling and improved domain robustness. Significance: enhances safety and reliability for medical imaging decisions across heterogeneous datasets; future work explores weighted ensemble calibration and integrating conformal prediction into training to further boost robustness.

Abstract

Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.

Domain Adaptive Skin Lesion Classification via Conformal Ensemble of Vision Transformers

TL;DR

Problem: domain shift reduces trustworthiness of skin-lesion classifiers in real-world deployment. Approach: Conformal Ensemble of Vision Transformers (CE-ViTs) ensembles ViTs trained on HAM10000, Dermofit, and Skin Cancer ISIC and applies conformal prediction to produce calibrated prediction sets with uncertainty estimates. Key results: CE-ViTs achieves a coverage of 90.38% and increases the average prediction-set size for misclassified samples to , while correct-prediction sets average around , indicating robust uncertainty handling and improved domain robustness. Significance: enhances safety and reliability for medical imaging decisions across heterogeneous datasets; future work explores weighted ensemble calibration and integrating conformal prediction into training to further boost robustness.

Abstract

Exploring the trustworthiness of deep learning models is crucial, especially in critical domains such as medical imaging decision support systems. Conformal prediction has emerged as a rigorous means of providing deep learning models with reliable uncertainty estimates and safety guarantees. However, conformal prediction results face challenges due to the backbone model's struggles in domain-shifted scenarios, such as variations in different sources. To aim this challenge, this paper proposes a novel framework termed Conformal Ensemble of Vision Transformers (CE-ViTs) designed to enhance image classification performance by prioritizing domain adaptation and model robustness, while accounting for uncertainty. The proposed method leverages an ensemble of vision transformer models in the backbone, trained on diverse datasets including HAM10000, Dermofit, and Skin Cancer ISIC datasets. This ensemble learning approach, calibrated through the combined mentioned datasets, aims to enhance domain adaptation through conformal learning. Experimental results underscore that the framework achieves a high coverage rate of 90.38\%, representing an improvement of 9.95\% compared to the HAM10000 model. This indicates a strong likelihood that the prediction set includes the true label compared to singular models. Ensemble learning in CE-ViTs significantly improves conformal prediction performance, increasing the average prediction set size for challenging misclassified samples from 1.86 to 3.075.

Paper Structure

This paper contains 11 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: A pre-trained ViT architecture with a new classifier head, fine-tuned for the skin lesion dataset.
  • Figure 2: The proposed CE-ViTs framework during training and inference stages.
  • Figure 3: Coverage and prediction set size correlation under the test dataset.
  • Figure 4: The frequency and uncertainty value correlation for correct and incorrect predictions with different models.