Table of Contents
Fetching ...

TAFM-Net: A Novel Approach to Skin Lesion Segmentation Using Transformer Attention and Focal Modulation

Tariq M Khan, Dawn Lin, Shahzaib Iqbal, Erik Meijering

TL;DR

TAFM-Net addresses the challenge of accurate skin lesion segmentation under varied dermoscopic conditions by integrating an EfficientNetV2B1 encoder with a self-aware transformer attention module and focal modulation in skip connections. The architecture captures both local and global context through Transformer Self-Attention and Global Spatial Attention, enhanced by Dense skip pathways and FM blocks in the decoder. A fused regional-and-boundary loss with a dynamic alpha schedule further guides training, yielding competitive Jaccard indices on ISIC2016/2017/2018 and PH2 datasets and demonstrating faster convergence and favorable efficiency. The work advances clinical applicability by delivering high segmentation accuracy with modest model size, paving the way for robust, interpretable CAD systems for skin cancer screening.

Abstract

Incorporating modern computer vision techniques into clinical protocols shows promise in improving skin lesion segmentation. The U-Net architecture has been a key model in this area, iteratively improved to address challenges arising from the heterogeneity of dermatologic images due to varying clinical settings, lighting, patient attributes, and hair density. To further improve skin lesion segmentation, we developed TAFM-Net, an innovative model leveraging self-adaptive transformer attention (TA) coupled with focal modulation (FM). Our model integrates an EfficientNetV2B1 encoder, which employs TA to enhance spatial and channel-related saliency, while a densely connected decoder integrates FM within skip connections, enhancing feature emphasis, segmentation performance, and interpretability crucial for medical image analysis. A novel dynamic loss function amalgamates region and boundary information, guiding effective model training. Our model achieves competitive performance, with Jaccard coefficients of 93.64\%, 86.88\% and 92.88\% in the ISIC2016, ISIC2017 and ISIC2018 datasets, respectively, demonstrating its potential in real-world scenarios.

TAFM-Net: A Novel Approach to Skin Lesion Segmentation Using Transformer Attention and Focal Modulation

TL;DR

TAFM-Net addresses the challenge of accurate skin lesion segmentation under varied dermoscopic conditions by integrating an EfficientNetV2B1 encoder with a self-aware transformer attention module and focal modulation in skip connections. The architecture captures both local and global context through Transformer Self-Attention and Global Spatial Attention, enhanced by Dense skip pathways and FM blocks in the decoder. A fused regional-and-boundary loss with a dynamic alpha schedule further guides training, yielding competitive Jaccard indices on ISIC2016/2017/2018 and PH2 datasets and demonstrating faster convergence and favorable efficiency. The work advances clinical applicability by delivering high segmentation accuracy with modest model size, paving the way for robust, interpretable CAD systems for skin cancer screening.

Abstract

Incorporating modern computer vision techniques into clinical protocols shows promise in improving skin lesion segmentation. The U-Net architecture has been a key model in this area, iteratively improved to address challenges arising from the heterogeneity of dermatologic images due to varying clinical settings, lighting, patient attributes, and hair density. To further improve skin lesion segmentation, we developed TAFM-Net, an innovative model leveraging self-adaptive transformer attention (TA) coupled with focal modulation (FM). Our model integrates an EfficientNetV2B1 encoder, which employs TA to enhance spatial and channel-related saliency, while a densely connected decoder integrates FM within skip connections, enhancing feature emphasis, segmentation performance, and interpretability crucial for medical image analysis. A novel dynamic loss function amalgamates region and boundary information, guiding effective model training. Our model achieves competitive performance, with Jaccard coefficients of 93.64\%, 86.88\% and 92.88\% in the ISIC2016, ISIC2017 and ISIC2018 datasets, respectively, demonstrating its potential in real-world scenarios.

Paper Structure

This paper contains 30 sections, 23 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Examples of challenging skin-lesion dermoscopy images: (a) variation in appearance, (b) presence of artifacts, (c) multiple lesions, (d) low contrast, (e) presence of hair.
  • Figure 2: Design of the proposed transformer attention focal modulation network (TAFM-Net).
  • Figure 3: Design of the self-aware attention module. Top: The transformer self-attention (TSA) block. Bottom: The global spatial attention (GSA) block. $F_\text{in}\in R^{8\times 8 \times 1280}$ is the output of the encoder block and $F_\text{out}\in R^{8\times 8\times 1280}$ is the input of the decoder block.
  • Figure 4: Design of the focal modulation block.
  • Figure 5: Design of the decoder block.
  • ...and 7 more figures