Table of Contents
Fetching ...

A Multimodal Vision Foundation Model for Clinical Dermatology

Siyuan Yan, Zhen Yu, Clare Primiero, Cristina Vico-Alonso, Zhonghua Wang, Litao Yang, Philipp Tschandl, Ming Hu, Lie Ju, Gin Tan, Vincent Tang, Aik Beng Ng, David Powell, Paul Bonnington, Simon See, Elisabetta Magnaterra, Peter Ferguson, Jennifer Nguyen, Pascale Guitera, Jose Banuls, Monika Janda, Victoria Mar, Harald Kittler, H. Peter Soyer, Zongyuan Ge

TL;DR

PanDerm is introduced, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities that demonstrates PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties.

Abstract

Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities. We evaluated PanDerm on 28 diverse benchmarks, including skin cancer screening, risk stratification, differential diagnosis of common and rare skin conditions, lesion segmentation, longitudinal monitoring, and metastasis prediction and prognosis. PanDerm achieved state-of-the-art performance across all evaluated tasks, often outperforming existing models when using only 10% of labeled data. We conducted three reader studies to assess PanDerm's potential clinical utility. PanDerm outperformed clinicians by 10.2% in early-stage melanoma detection through longitudinal analysis, improved clinicians' skin cancer diagnostic accuracy by 11% on dermoscopy images, and enhanced non-dermatologist healthcare providers' differential diagnosis by 16.5% across 128 skin conditions on clinical photographs. These results demonstrate PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties, potentially accelerating the integration of AI support in healthcare. The code can be found at https://github.com/SiyuanYan1/PanDerm.

A Multimodal Vision Foundation Model for Clinical Dermatology

TL;DR

PanDerm is introduced, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities that demonstrates PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties.

Abstract

Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities. We evaluated PanDerm on 28 diverse benchmarks, including skin cancer screening, risk stratification, differential diagnosis of common and rare skin conditions, lesion segmentation, longitudinal monitoring, and metastasis prediction and prognosis. PanDerm achieved state-of-the-art performance across all evaluated tasks, often outperforming existing models when using only 10% of labeled data. We conducted three reader studies to assess PanDerm's potential clinical utility. PanDerm outperformed clinicians by 10.2% in early-stage melanoma detection through longitudinal analysis, improved clinicians' skin cancer diagnostic accuracy by 11% on dermoscopy images, and enhanced non-dermatologist healthcare providers' differential diagnosis by 16.5% across 128 skin conditions on clinical photographs. These results demonstrate PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties, potentially accelerating the integration of AI support in healthcare. The code can be found at https://github.com/SiyuanYan1/PanDerm.

Paper Structure

This paper contains 18 figures, 41 tables.

Figures (18)

  • Figure 1: Overview of this study. Caption on next page.
  • Figure 1: Quantitative skin lesion segmentation results.a, b. Segmentation performance measured by dice score (DSC) and Jaccard index (JAC) for PanDerm and baseline models on ISIC2018 and HAM10000 datasets. c, d. Label efficiency generalization performance for PanDerm and baselines, showing mean DSC and JAC on ISIC2018 and HAM10000 datasets. Error bars in a, b indicate 95% confidence intervals; bar centers represent mean values. Points in c, d denote mean values. All estimates are derived from five replicas with different seeds. Statistical significance was assessed using two-sided t-tests.
  • Figure 2: PanDerm's versatile capacity in diverse diagnosis tasks.a. Performance comparison of PanDerm versus other pretrained models on 10 pigmented skin lesion datasets across multiple centers and modalities. n: data size, c: class number. Metrics: AUROC for binary class (c=2) and W_F1 score for multi-class (c$>$2) datasets. Dashed lines indicate average model performance across datasets. b. Comparison between PanDerm and other pretrained models in label efficiency generalization on four representative datasets, showing performance at various training data percentages. Vertical dash lines indicate the data quantity needed for PanDerm to match existing model performance. c. External validation for melanoma diagnosis across 7 datasets. d. Performance evaluation of general skin condition classification (up to 74 classes) using clinical images. Error bars in a, c, d show 95% CIs; bar centers in a, c, d represent mean value; dots in b represent mean value. Estimates were computed using nonparametric bootstrapping with 1000 bootstrap replicates. P-values calculated using a two-sided t-test.
  • Figure 2: Qualitative skin lesion segmentation results.a. Comparison of PanDerm against baseline models on challenging examples from HAM10000. Red contours indicate ground truth masks, while cyan contours show model predictions. b. PanDerm segmentation results on a random selection of images from HAM10000.
  • Figure 3: Short-term lesion change detection and metastasis prognosis results.a. SDDI1 dataset statistics: ratio of changed lesions, the ratio of changed malignant lesions during follow-up, and follow-up time distribution. b. Ratio of changed lesions in SDDI2 dataset. c. Ablation study on pre-processing methods: Default" (direct input), w/Warp" (registration only), w/Mask" (lesion segmentation), and w/Whole pipeline" (complete pre-processing as in Extended Data Fig \ref{['supp_change_method']}). For change detection, all models were evaluated using the whole pre-processing pipeline. d. Performance of binary metastasis prediction (control vs. metastasis) by AUROC. e. Scheme of PanDerm for melanoma metastasis and prognosis prediction. f. Distribution of metastasis types in Combinemel dataset (MS represents metastasis). g. Kaplan–Meier curves for the recurrence-free interval (RFI) in invasive melanoma patients (CombinMel dataset), stratified by PanDerm prediction scores. h. Forest plots of hazard ratios for PanDerm, stratified groups in invasive melanoma patients. i. Time-dependent AUC of PanDerm vs. clinical variable score combinations. j. Time-dependent AUC comparison of PanDerm and other pretrained models. Error bars in c-d and i-j represent 95% CIs; bar centers indicate the mean value. Estimates computed with five-fold cross-validation.
  • ...and 13 more figures