Table of Contents
Fetching ...

Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics

Swarnava Bhattacharyya, Umapada Pal, Tapabrata Chakraborti

TL;DR

This work addresses the challenge of deploying powerful foundation models for skin lesion classification by incorporating predictive uncertainty quantification and demographic fairness. It combines conformal prediction with a dynamic F1-weighted sampler and leverages a DermFoundation backbone to produce per-sample uncertainty sets while guaranteeing population-level coverage. Validated on ISIC2019 and ASAN, the approach yields improvements for minority classes without sacrificing overall accuracy and reveals robust uncertainty signals across sex, age, and ethnicity. The framework is model- and task-agnostic, enabling safer clinical translation and progress toward personalized dermatology through per-patient conformal sets and transparent decision support.

Abstract

Deep learning based diagnostic AI systems based on medical images are starting to provide similar performance as human experts. However these data hungry complex systems are inherently black boxes and therefore slow to be adopted for high risk applications like healthcare. This problem of lack of transparency is exacerbated in the case of recent large foundation models, which are trained in a self supervised manner on millions of data points to provide robust generalisation across a range of downstream tasks, but the embeddings generated from them happen through a process that is not interpretable, and hence not easily trustable for clinical applications. To address this timely issue, we deploy conformal analysis to quantify the predictive uncertainty of a vision transformer (ViT) based foundation model across patient demographics with respect to sex, age and ethnicity for the tasks of skin lesion classification using several public benchmark datasets. The significant advantage of this method is that conformal analysis is method independent and it not only provides a coverage guarantee at population level but also provides an uncertainty score for each individual. We used a model-agnostic dynamic F1-score-based sampling during model training, which helped to stabilize the class imbalance and we investigate the effects on uncertainty quantification (UQ) with or without this bias mitigation step. Thus we show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model (Google DermFoundation) and thus advance the trustworthiness and fairness of clinical AI.

Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics

TL;DR

This work addresses the challenge of deploying powerful foundation models for skin lesion classification by incorporating predictive uncertainty quantification and demographic fairness. It combines conformal prediction with a dynamic F1-weighted sampler and leverages a DermFoundation backbone to produce per-sample uncertainty sets while guaranteeing population-level coverage. Validated on ISIC2019 and ASAN, the approach yields improvements for minority classes without sacrificing overall accuracy and reveals robust uncertainty signals across sex, age, and ethnicity. The framework is model- and task-agnostic, enabling safer clinical translation and progress toward personalized dermatology through per-patient conformal sets and transparent decision support.

Abstract

Deep learning based diagnostic AI systems based on medical images are starting to provide similar performance as human experts. However these data hungry complex systems are inherently black boxes and therefore slow to be adopted for high risk applications like healthcare. This problem of lack of transparency is exacerbated in the case of recent large foundation models, which are trained in a self supervised manner on millions of data points to provide robust generalisation across a range of downstream tasks, but the embeddings generated from them happen through a process that is not interpretable, and hence not easily trustable for clinical applications. To address this timely issue, we deploy conformal analysis to quantify the predictive uncertainty of a vision transformer (ViT) based foundation model across patient demographics with respect to sex, age and ethnicity for the tasks of skin lesion classification using several public benchmark datasets. The significant advantage of this method is that conformal analysis is method independent and it not only provides a coverage guarantee at population level but also provides an uncertainty score for each individual. We used a model-agnostic dynamic F1-score-based sampling during model training, which helped to stabilize the class imbalance and we investigate the effects on uncertainty quantification (UQ) with or without this bias mitigation step. Thus we show how this can be used as a fairness metric to evaluate the robustness of the feature embeddings of the foundation model (Google DermFoundation) and thus advance the trustworthiness and fairness of clinical AI.

Paper Structure

This paper contains 12 sections, 3 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: Pipeline of the proposed system. Our AI-driven workflow can be integrated with skin cancer diagnosis systems for supporting manual diagnosis of patients by classifying skin cancer from dermatological images, especially in lower time and memory constraints. In real-time, it can classify image samples into skin cancer subtypes and produce prediction sets showing the guarantee associated with the most-probable predictions
  • Figure 2: Detailed steps in the conformal prediction set generation process
  • Figure 3: Variation of prediction set difficulty with patient metadata from ISIC2019
  • Figure 4: A2 accuracy for different patient categories from ISIC2019 and ASAN
  • Figure 5: Violin plots for different patient categories from the ISIC2019 and the ASAN datasets showing ground-truth level confidences
  • ...and 1 more figures