TESL-Net: A Transformer-Enhanced CNN for Accurate Skin Lesion Segmentation
Shahzaib Iqbal, Muhammad Zeeshan, Mehwish Mehmood, Tariq M. Khan, Imran Razzak
TL;DR
TESL-Net introduces a CNN-Transformer hybrid for skin lesion segmentation by embedding Swin Transformer blocks in the encoder and Bi-ConvLSTM in skip connections to capture long-range and temporal context. The approach achieves state-of-the-art performance on ISIC 2016, 2017, and 2018 datasets, outperforming multiple SOTA methods in $Jaccard$ ($IoU$) and other metrics. Thorough experiments demonstrate robustness to artefacts and irregular lesion shapes, with a practical 256×256 input setting and no augmentation required for strong results. This work advances automated dermoscopic image analysis and offers a scalable framework for extending transformer-enhanced CNNs to related medical-imaging segmentation tasks.
Abstract
Early detection of skin cancer relies on precise segmentation of dermoscopic images of skin lesions. However, this task is challenging due to the irregular shape of the lesion, the lack of sharp borders, and the presence of artefacts such as marker colours and hair follicles. Recent methods for melanoma segmentation are U-Nets and fully connected networks (FCNs). As the depth of these neural network models increases, they can face issues like the vanishing gradient problem and parameter redundancy, potentially leading to a decrease in the Jaccard index of the segmentation model. In this study, we introduced a novel network named TESL-Net for the segmentation of skin lesions. The proposed TESL-Net involves a hybrid network that combines the local features of a CNN encoder-decoder architecture with long-range and temporal dependencies using bi-convolutional long-short-term memory (Bi-ConvLSTM) networks and a Swin transformer. This enables the model to account for the uncertainty of segmentation over time and capture contextual channel relationships in the data. We evaluated the efficacy of TESL-Net in three commonly used datasets (ISIC 2016, ISIC 2017, and ISIC 2018) for the segmentation of skin lesions. The proposed TESL-Net achieves state-of-the-art performance, as evidenced by a significantly elevated Jaccard index demonstrated by empirical results.
