Table of Contents
Fetching ...

Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification

Amirreza Mahbod, Rupert Ecker, Ramona Woitek

TL;DR

This study addresses dermatoscopic image classification by comparing a dermatology-focused foundation model PanDerm against two Vision Transformer baselines. It investigates non-linear probing of PanDerm embeddings using MLP, XGBoost, and TabNet, contrasted with full fine-tuning of ViT-based models, across HAM10000 and MSKCC datasets. A key finding is that PanDerm's MLP probe can match the performance of a fully fine-tuned Swin Transformer, and fusion of PanDerm with Swin yields the best results on HAM10000 and competitive results on MSKCC. The work demonstrates the value of domain-specific foundation models and fusion strategies for medical image classification, with implications for efficient deployment and potential improvements through additional models and fusion techniques.

Abstract

Accurate classification of skin lesions from dermatoscopic images is essential for diagnosis and treatment of skin cancer. In this study, we investigate the utility of a dermatology-specific foundation model, PanDerm, in comparison with two Vision Transformer (ViT) architectures (ViT base and Swin Transformer V2 base) for the task of skin lesion classification. Using frozen features extracted from PanDerm, we apply non-linear probing with three different classifiers, namely, multi-layer perceptron (MLP), XGBoost, and TabNet. For the ViT-based models, we perform full fine-tuning to optimize classification performance. Our experiments on the HAM10000 and MSKCC datasets demonstrate that the PanDerm-based MLP model performs comparably to the fine-tuned Swin transformer model, while fusion of PanDerm and Swin Transformer predictions leads to further performance improvements. Future work will explore additional foundation models, fine-tuning strategies, and advanced fusion techniques.

Fusion of Foundation and Vision Transformer Model Features for Dermatoscopic Image Classification

TL;DR

This study addresses dermatoscopic image classification by comparing a dermatology-focused foundation model PanDerm against two Vision Transformer baselines. It investigates non-linear probing of PanDerm embeddings using MLP, XGBoost, and TabNet, contrasted with full fine-tuning of ViT-based models, across HAM10000 and MSKCC datasets. A key finding is that PanDerm's MLP probe can match the performance of a fully fine-tuned Swin Transformer, and fusion of PanDerm with Swin yields the best results on HAM10000 and competitive results on MSKCC. The work demonstrates the value of domain-specific foundation models and fusion strategies for medical image classification, with implications for efficient deployment and potential improvements through additional models and fusion techniques.

Abstract

Accurate classification of skin lesions from dermatoscopic images is essential for diagnosis and treatment of skin cancer. In this study, we investigate the utility of a dermatology-specific foundation model, PanDerm, in comparison with two Vision Transformer (ViT) architectures (ViT base and Swin Transformer V2 base) for the task of skin lesion classification. Using frozen features extracted from PanDerm, we apply non-linear probing with three different classifiers, namely, multi-layer perceptron (MLP), XGBoost, and TabNet. For the ViT-based models, we perform full fine-tuning to optimize classification performance. Our experiments on the HAM10000 and MSKCC datasets demonstrate that the PanDerm-based MLP model performs comparably to the fine-tuned Swin transformer model, while fusion of PanDerm and Swin Transformer predictions leads to further performance improvements. Future work will explore additional foundation models, fine-tuning strategies, and advanced fusion techniques.

Paper Structure

This paper contains 5 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Example images from the MSKCC and HAM10000 datasets.