Table of Contents
Fetching ...

LSSF-Net: Lightweight Segmentation with Self-Awareness, Spatial Attention, and Focal Modulation

Hamza Farooq, Zuhair Zafar, Ahsan Saadat, Tariq M Khan, Shahzaib Iqbal, Imran Razzak

TL;DR

This work tackles the challenge of accurate skin lesion segmentation on resource-constrained devices by introducing LSSF-Net, a lightweight encoder-decoder network with booster pathways and a novel combination of attention mechanisms and channel-wise optimizations. Key innovations include Conformer-based Focal Modulation Attention (CFMA) in skip connections, Self-aware Attention Blocks (SAB), Global Spatial Attention (GSA), and a Split Channel-Shuffle (SCS) strategy, enabling strong local detail preservation and global context understanding within ~0.8M parameters. The approach achieves state-of-the-art results on ISIC 2016/2017/2018 and PH2 datasets, and demonstrates robust generalization to ultrasound datasets (BUSI and DDTI), while maintaining low computational cost (3.1 GFLOPs) and fast inference (~13.7 ms). These findings suggest a practical path toward accurate, mobile-friendly dermatology CAD support with broad cross-domain applicability and potential for real-world clinical deployment.

Abstract

Accurate segmentation of skin lesions within dermoscopic images plays a crucial role in the timely identification of skin cancer for computer-aided diagnosis on mobile platforms. However, varying shapes of the lesions, lack of defined edges, and the presence of obstructions such as hair strands and marker colors make this challenge more complex. \textcolor{red}Additionally, skin lesions often exhibit subtle variations in texture and color that are difficult to differentiate from surrounding healthy skin, necessitating models that can capture both fine-grained details and broader contextual information. Currently, melanoma segmentation models are commonly based on fully connected networks and U-Nets. However, these models often struggle with capturing the complex and varied characteristics of skin lesions, such as the presence of indistinct boundaries and diverse lesion appearances, which can lead to suboptimal segmentation performance.To address these challenges, we propose a novel lightweight network specifically designed for skin lesion segmentation utilizing mobile devices, featuring a minimal number of learnable parameters (only 0.8 million). This network comprises an encoder-decoder architecture that incorporates conformer-based focal modulation attention, self-aware local and global spatial attention, and split channel-shuffle. The efficacy of our model has been evaluated on four well-established benchmark datasets for skin lesion segmentation: ISIC 2016, ISIC 2017, ISIC 2018, and PH2. Empirical findings substantiate its state-of-the-art performance, notably reflected in a high Jaccard index.

LSSF-Net: Lightweight Segmentation with Self-Awareness, Spatial Attention, and Focal Modulation

TL;DR

This work tackles the challenge of accurate skin lesion segmentation on resource-constrained devices by introducing LSSF-Net, a lightweight encoder-decoder network with booster pathways and a novel combination of attention mechanisms and channel-wise optimizations. Key innovations include Conformer-based Focal Modulation Attention (CFMA) in skip connections, Self-aware Attention Blocks (SAB), Global Spatial Attention (GSA), and a Split Channel-Shuffle (SCS) strategy, enabling strong local detail preservation and global context understanding within ~0.8M parameters. The approach achieves state-of-the-art results on ISIC 2016/2017/2018 and PH2 datasets, and demonstrates robust generalization to ultrasound datasets (BUSI and DDTI), while maintaining low computational cost (3.1 GFLOPs) and fast inference (~13.7 ms). These findings suggest a practical path toward accurate, mobile-friendly dermatology CAD support with broad cross-domain applicability and potential for real-world clinical deployment.

Abstract

Accurate segmentation of skin lesions within dermoscopic images plays a crucial role in the timely identification of skin cancer for computer-aided diagnosis on mobile platforms. However, varying shapes of the lesions, lack of defined edges, and the presence of obstructions such as hair strands and marker colors make this challenge more complex. \textcolor{red}Additionally, skin lesions often exhibit subtle variations in texture and color that are difficult to differentiate from surrounding healthy skin, necessitating models that can capture both fine-grained details and broader contextual information. Currently, melanoma segmentation models are commonly based on fully connected networks and U-Nets. However, these models often struggle with capturing the complex and varied characteristics of skin lesions, such as the presence of indistinct boundaries and diverse lesion appearances, which can lead to suboptimal segmentation performance.To address these challenges, we propose a novel lightweight network specifically designed for skin lesion segmentation utilizing mobile devices, featuring a minimal number of learnable parameters (only 0.8 million). This network comprises an encoder-decoder architecture that incorporates conformer-based focal modulation attention, self-aware local and global spatial attention, and split channel-shuffle. The efficacy of our model has been evaluated on four well-established benchmark datasets for skin lesion segmentation: ISIC 2016, ISIC 2017, ISIC 2018, and PH2. Empirical findings substantiate its state-of-the-art performance, notably reflected in a high Jaccard index.
Paper Structure (31 sections, 27 equations, 10 figures, 11 tables, 3 algorithms)

This paper contains 31 sections, 27 equations, 10 figures, 11 tables, 3 algorithms.

Figures (10)

  • Figure 1: Block diagram of the proposed LSSF-Net. "CFMA" is conformer-based focal modulation attention, "SAB" is the self-attention block, and "GSA" is global spatial attention.
  • Figure 2: Schematic of the Conformer-based Focal Modulation Attention (CFMA), "LN" is the layer normalization.
  • Figure 3: Visual results of ablation study on ISIC 2017 dataset. $1^{st}$ column shows the color image, $2^{nd}$ column shows the corresponding ground truth, $3^{rd}$ column shows the output of baseline network (BN), $4^{th}$ column shows the output of (BN + CFMA), $5^{th}$ column shows the output of (BN + SAB), $6^{th}$ column shows the output of (BN + CFMA + SAB), $7^{th}$ column shows the output of (BN + CFMA + SCS-SAB), and $8^{th}$ column shows the output of (BN + CFMA + SCS-SAB + Transfer Learning).
  • Figure 4: Comparison of the visual performance of the proposed LSSF-Net on ISIC 2018 codella2019skin dataset.
  • Figure 5: Comparison of the visual performance of the proposed LSSF-Net on ISIC 2017 codella2018skin dataset.
  • ...and 5 more figures