Table of Contents
Fetching ...

VLM Models and Automated Grading of Atopic Dermatitis

Marc Lalonde, Hamed Ghodrati

TL;DR

This study evaluates seven vision-language models for automating atopic dermatitis severity grading from clinical images using the EASI scoring framework. By constructing a test bench and leveraging SkinCAP-derived data with dermatologist-provided EASI annotations, the authors assess zero-shot performance (notably GPT-4o) and explore fine-tuning of SkinGPT and PaliGemma. Results show GPT-4o achieves the best per-symptom accuracy, but overall MAE remains non-negligible for zero-shot methods, indicating room for improvement through few-shot learning and model architectures. The work highlights both the potential and current limitations of VLMs in dermatology, pointing to future avenues such as hybrid models and more diverse, multi-annotator datasets to enable reliable, explainable AD grading.

Abstract

The task of grading atopic dermatitis (or AD, a form of eczema) from patient images is difficult even for trained dermatologists. Research on automating this task has progressed in recent years with the development of deep learning solutions; however, the rapid evolution of multimodal models and more specifically vision-language models (VLMs) opens the door to new possibilities in terms of explainable assessment of medical images, including dermatology. This report describes experiments carried out to evaluate the ability of seven VLMs to assess the severity of AD on a set of test images.

VLM Models and Automated Grading of Atopic Dermatitis

TL;DR

This study evaluates seven vision-language models for automating atopic dermatitis severity grading from clinical images using the EASI scoring framework. By constructing a test bench and leveraging SkinCAP-derived data with dermatologist-provided EASI annotations, the authors assess zero-shot performance (notably GPT-4o) and explore fine-tuning of SkinGPT and PaliGemma. Results show GPT-4o achieves the best per-symptom accuracy, but overall MAE remains non-negligible for zero-shot methods, indicating room for improvement through few-shot learning and model architectures. The work highlights both the potential and current limitations of VLMs in dermatology, pointing to future avenues such as hybrid models and more diverse, multi-annotator datasets to enable reliable, explainable AD grading.

Abstract

The task of grading atopic dermatitis (or AD, a form of eczema) from patient images is difficult even for trained dermatologists. Research on automating this task has progressed in recent years with the development of deep learning solutions; however, the rapid evolution of multimodal models and more specifically vision-language models (VLMs) opens the door to new possibilities in terms of explainable assessment of medical images, including dermatology. This report describes experiments carried out to evaluate the ability of seven VLMs to assess the severity of AD on a set of test images.

Paper Structure

This paper contains 14 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Visual examples of the EASI grading scale. From homeforeczema.org easi-ug
  • Figure 2: Descriptions of some test images before and after fine-tuning PaliGemma 2