Table of Contents
Fetching ...

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

Siddharth Tiwari

TL;DR

The paper addresses the challenge of skin lesion segmentation by combining CNN-based local feature extraction with transformer-based global context modeling in a parallel dual-branch architecture. A novel fusion module, incorporating channel and spatial attention and a Hadamard interaction, merges features from both branches with attention-guided skip connections, enabling end-to-end training without very deep networks. Ablation studies and ISIC2017 experiments show competitive performance (IoU/Jaccard up to 0.795, Dice 0.872) with high pixel accuracy (0.944) and fewer parameters, highlighting computational efficiency for edge devices. This approach demonstrates the practical potential of CNN-Transformer fusion for robust medical image segmentation, offering avenues for improved generalization and interpretability in clinical workflows.

Abstract

The segmentation of medical images is important for the improvement and creation of healthcare systems, particularly for early disease detection and treatment planning. In recent years, the use of convolutional neural networks (CNNs) and other state-of-the-art methods has greatly advanced medical image segmentation. However, CNNs have been found to struggle with learning long-range dependencies and capturing global context due to the limitations of convolution operations. In this paper, we explore the use of transformers and CNNs for medical image segmentation and propose a hybrid architecture that combines the ability of transformers to capture global dependencies with the ability of CNNs to capture low-level spatial details. We compare various architectures and configurations and conduct multiple experiments to evaluate their effectiveness.

Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation

TL;DR

The paper addresses the challenge of skin lesion segmentation by combining CNN-based local feature extraction with transformer-based global context modeling in a parallel dual-branch architecture. A novel fusion module, incorporating channel and spatial attention and a Hadamard interaction, merges features from both branches with attention-guided skip connections, enabling end-to-end training without very deep networks. Ablation studies and ISIC2017 experiments show competitive performance (IoU/Jaccard up to 0.795, Dice 0.872) with high pixel accuracy (0.944) and fewer parameters, highlighting computational efficiency for edge devices. This approach demonstrates the practical potential of CNN-Transformer fusion for robust medical image segmentation, offering avenues for improved generalization and interpretability in clinical workflows.

Abstract

The segmentation of medical images is important for the improvement and creation of healthcare systems, particularly for early disease detection and treatment planning. In recent years, the use of convolutional neural networks (CNNs) and other state-of-the-art methods has greatly advanced medical image segmentation. However, CNNs have been found to struggle with learning long-range dependencies and capturing global context due to the limitations of convolution operations. In this paper, we explore the use of transformers and CNNs for medical image segmentation and propose a hybrid architecture that combines the ability of transformers to capture global dependencies with the ability of CNNs to capture low-level spatial details. We compare various architectures and configurations and conduct multiple experiments to evaluate their effectiveness.
Paper Structure (20 sections, 2 equations, 3 figures, 3 tables)