Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

Siddharth Tiwari

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

Siddharth Tiwari

TL;DR

The paper addresses the challenge of skin lesion segmentation by combining CNN-based local feature extraction with transformer-based global context modeling in a parallel dual-branch architecture. A novel fusion module, incorporating channel and spatial attention and a Hadamard interaction, merges features from both branches with attention-guided skip connections, enabling end-to-end training without very deep networks. Ablation studies and ISIC2017 experiments show competitive performance (IoU/Jaccard up to 0.795, Dice 0.872) with high pixel accuracy (0.944) and fewer parameters, highlighting computational efficiency for edge devices. This approach demonstrates the practical potential of CNN-Transformer fusion for robust medical image segmentation, offering avenues for improved generalization and interpretability in clinical workflows.

Abstract

The segmentation of medical images is important for the improvement and creation of healthcare systems, particularly for early disease detection and treatment planning. In recent years, the use of convolutional neural networks (CNNs) and other state-of-the-art methods has greatly advanced medical image segmentation. However, CNNs have been found to struggle with learning long-range dependencies and capturing global context due to the limitations of convolution operations. In this paper, we explore the use of transformers and CNNs for medical image segmentation and propose a hybrid architecture that combines the ability of transformers to capture global dependencies with the ability of CNNs to capture low-level spatial details. We compare various architectures and configurations and conduct multiple experiments to evaluate their effectiveness.

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

TL;DR

Abstract

Paper Structure (20 sections, 2 equations, 3 figures, 3 tables)

This paper contains 20 sections, 2 equations, 3 figures, 3 tables.

Introduction
Traditional Approaches to Skin Lesion Segmentation
Deep Learning for Skin Lesion Segmentation
CNNs for Skin Lesion Segmentation
Transformers for Skin Lesion Segmentation
Combining CNNs and Transformers for Skin Lesion Segmentation
Proposed Method
Modeling
Implementation
Data Selection
Data Pre-processing and Transformation
Architecture Implementation
Fusion Module Implementation
Loss Function
Implementation details
...and 5 more sections

Figures (3)

Figure 1: Logical flow of the Architecture
Figure 2: Results visualization on 4 random Images
Figure 3: Results visualization on 4 selected images with failures

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

TL;DR

Abstract

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)