Table of Contents
Fetching ...

Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration

Mahapara Khurshid, Mayank Vatsa, Richa Singh

TL;DR

The paper addresses the challenge of accurate skin lesion classification from smartphone images in resource-limited settings by proposing a multimodal framework that fuses image features with clinical/demographic metadata and augments learning with an auxiliary super-resolution task. Visual and textual features are extracted via separate encoders and fused at the feature level with $FV_{Final} = FV_{Image} \odot FV_{Meta}$, while the model is trained with $L_{final} = \alpha L_{wce} + \beta L_{SR}$ where $\alpha=0.5$ and $\beta=1.0$, and the SRUpscale factor is 2. The approach achieves state-of-the-art performance on PAD-UFES20, with a best model showing $BACC=0.832$, $ACC=0.849$, and $AUC=0.960$, and per-class metrics indicating strong specificity and substantial sensitivity. This method promises improved early screening and diagnosis in underserved regions by leveraging readily accessible smartphone data and metadata, and it opens avenues for extending auxiliary-task learning to other medical modalities to further enhance diagnostic representations.

Abstract

The rising global prevalence of skin conditions, some of which can escalate to life-threatening stages if not timely diagnosed and treated, presents a significant healthcare challenge. This issue is particularly acute in remote areas where limited access to healthcare often results in delayed treatment, allowing skin diseases to advance to more critical stages. One of the primary challenges in diagnosing skin diseases is their low inter-class variations, as many exhibit similar visual characteristics, making accurate classification challenging. This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. This approach mimics the diagnostic process employed by medical professionals. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. This component plays a crucial role in refining visual details and enhancing feature extraction, leading to improved differentiation between classes and, consequently, elevating the overall effectiveness of the model. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures. The results of these experiments not only demonstrate the effectiveness of the proposed method but also its potential applicability under-resourced healthcare environments.

Optimizing Skin Lesion Classification via Multimodal Data and Auxiliary Task Integration

TL;DR

The paper addresses the challenge of accurate skin lesion classification from smartphone images in resource-limited settings by proposing a multimodal framework that fuses image features with clinical/demographic metadata and augments learning with an auxiliary super-resolution task. Visual and textual features are extracted via separate encoders and fused at the feature level with , while the model is trained with where and , and the SRUpscale factor is 2. The approach achieves state-of-the-art performance on PAD-UFES20, with a best model showing , , and , and per-class metrics indicating strong specificity and substantial sensitivity. This method promises improved early screening and diagnosis in underserved regions by leveraging readily accessible smartphone data and metadata, and it opens avenues for extending auxiliary-task learning to other medical modalities to further enhance diagnostic representations.

Abstract

The rising global prevalence of skin conditions, some of which can escalate to life-threatening stages if not timely diagnosed and treated, presents a significant healthcare challenge. This issue is particularly acute in remote areas where limited access to healthcare often results in delayed treatment, allowing skin diseases to advance to more critical stages. One of the primary challenges in diagnosing skin diseases is their low inter-class variations, as many exhibit similar visual characteristics, making accurate classification challenging. This research introduces a novel multimodal method for classifying skin lesions, integrating smartphone-captured images with essential clinical and demographic information. This approach mimics the diagnostic process employed by medical professionals. A distinctive aspect of this method is the integration of an auxiliary task focused on super-resolution image prediction. This component plays a crucial role in refining visual details and enhancing feature extraction, leading to improved differentiation between classes and, consequently, elevating the overall effectiveness of the model. The experimental evaluations have been conducted using the PAD-UFES20 dataset, applying various deep-learning architectures. The results of these experiments not only demonstrate the effectiveness of the proposed method but also its potential applicability under-resourced healthcare environments.
Paper Structure (9 sections, 5 equations, 5 figures, 3 tables)

This paper contains 9 sections, 5 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Examples of skin lesion images pacheco2020pad.
  • Figure 2: A schematic diagram illustrating the training and testing process of the proposed approach. The auxiliary task learning guides the visual feature extractor to refine the extracted features.
  • Figure 3: Confusion matrix for the best-performing model
  • Figure 4: AUC for the best-performing model
  • Figure 5: tSNE plots of the proposed and SOTA model.