TAFM-Net: A Novel Approach to Skin Lesion Segmentation Using Transformer Attention and Focal Modulation
Tariq M Khan, Dawn Lin, Shahzaib Iqbal, Erik Meijering
TL;DR
TAFM-Net addresses the challenge of accurate skin lesion segmentation under varied dermoscopic conditions by integrating an EfficientNetV2B1 encoder with a self-aware transformer attention module and focal modulation in skip connections. The architecture captures both local and global context through Transformer Self-Attention and Global Spatial Attention, enhanced by Dense skip pathways and FM blocks in the decoder. A fused regional-and-boundary loss with a dynamic alpha schedule further guides training, yielding competitive Jaccard indices on ISIC2016/2017/2018 and PH2 datasets and demonstrating faster convergence and favorable efficiency. The work advances clinical applicability by delivering high segmentation accuracy with modest model size, paving the way for robust, interpretable CAD systems for skin cancer screening.
Abstract
Incorporating modern computer vision techniques into clinical protocols shows promise in improving skin lesion segmentation. The U-Net architecture has been a key model in this area, iteratively improved to address challenges arising from the heterogeneity of dermatologic images due to varying clinical settings, lighting, patient attributes, and hair density. To further improve skin lesion segmentation, we developed TAFM-Net, an innovative model leveraging self-adaptive transformer attention (TA) coupled with focal modulation (FM). Our model integrates an EfficientNetV2B1 encoder, which employs TA to enhance spatial and channel-related saliency, while a densely connected decoder integrates FM within skip connections, enhancing feature emphasis, segmentation performance, and interpretability crucial for medical image analysis. A novel dynamic loss function amalgamates region and boundary information, guiding effective model training. Our model achieves competitive performance, with Jaccard coefficients of 93.64\%, 86.88\% and 92.88\% in the ISIC2016, ISIC2017 and ISIC2018 datasets, respectively, demonstrating its potential in real-world scenarios.
