An Attention-Guided Deep Learning Approach for Classifying 39 Skin Lesion Types
Sauda Adiv Hanum, Ashim Dey, Muhammad Ashad Kabir
TL;DR
The paper tackles large-scale multiclass skin lesion classification by curating a 39-class dataset from five public sources and evaluating five deep-learning models. It introduces attention-guided enhancements (ECA and CBAM) and demonstrates that Vision Transformer with CBAM achieves the top performance, reaching 93.46% accuracy and an AUC of 0.99. The dataset balancing (130 images per class) and augmentation strategy, along with a thorough evaluation framework, contribute to robust, generalizable results across diverse lesion types. The work highlights the promise of attention-guided transformers for dermatology, while acknowledging limitations such as misclassification among visually similar diseases and the need for multimodal data and further efficiency improvements for clinical deployment.
Abstract
The skin, as the largest organ of the human body, is vulnerable to a diverse array of conditions collectively known as skin lesions, which encompass various dermatoses. Diagnosing these lesions presents significant challenges for medical practitioners due to the subtle visual differences that are often imperceptible to the naked eye. While not all skin lesions are life-threatening, certain types can act as early indicators of severe diseases, including skin cancers, underscoring the critical need for timely and accurate diagnostic methods. Deep learning algorithms have demonstrated remarkable potential in facilitating the early detection and prognosis of skin lesions. This study advances the field by curating a comprehensive and diverse dataset comprising 39 categories of skin lesions, synthesized from five publicly available datasets. Using this dataset, the performance of five state-of-the-art deep learning models -- MobileNetV2, Xception, InceptionV3, EfficientNetB1, and Vision Transformer - is rigorously evaluated. To enhance the accuracy and robustness of these models, attention mechanisms such as the Efficient Channel Attention (ECA) and the Convolutional Block Attention Module (CBAM) are incorporated into their architectures. Comprehensive evaluation across multiple performance metrics reveals that the Vision Transformer model integrated with CBAM outperforms others, achieving an accuracy of 93.46%, precision of 94%, recall of 93%, F1-score of 93%, and specificity of 93.67%. These results underscore the significant potential of the proposed system in supporting medical professionals with accurate and efficient prognostic tools for diagnosing a broad spectrum of skin lesions. The dataset and code used in this study can be found at https://github.com/akabircs/Skin-Lesions-Classification.
